[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946237#comment-14946237
 ] 

Rohith Sharma K S commented on YARN-4228:
-

+1 lgtm.. can you check test failures?

> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-06 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946238#comment-14946238
 ] 

Bibin A Chundatt commented on YARN-4228:


[~rohithsharma]
Thnks for looking in to this issue. Failures are not related . YARN-3342 jira 
is already available for the same 

> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946231#comment-14946231
 ] 

Hudson commented on YARN-4209:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8581 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8581/])
YARN-4209. RMStateStore FENCED state doesn’t work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946240#comment-14946240
 ] 

Hudson commented on YARN-4209:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #498 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/498/])
YARN-4209. RMStateStore FENCED state doesn’t work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946269#comment-14946269
 ] 

Hudson commented on YARN-4228:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8582 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8582/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to kill AM attempts

2015-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945087#comment-14945087
 ] 

Jason Lowe commented on YARN-261:
-

Sorry for the late reply.  IIRC the original patch implemented a fail attempt 
rather than a kill attempt because at the time that's all the YARN state 
machines supported.  Back then if an application attempt did not unregister 
then the only option was to treat it as a failure.

If it's easy to add both kill and fail options then that would be great.  If 
it's complicated to implement kill then we can get this fail functionality in 
and add kill as a followup.

Latest patch looks pretty good besides the whitespace and checkstyle nits.  One 
other nit: it would be nice to reuse a constant final saving transtition with 
the AttemptFailedTransition object rather than a unique one for every time it's 
needed in the state machine.  Also the unit tests don't actually test the most 
common use-case which is failing an attempt that is running. 


> Ability to kill AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, 
> YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, 
> YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-06 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.001.patch

Attaching Patch with testCase for FairScheduler in the AppManager.
Requesting Review

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: (was: YARN-3446.003.patch)

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: YARN-3446.003.patch

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946207#comment-14946207
 ] 

zhihai xu commented on YARN-4209:
-

Thanks [~rohithsharma] for reviewing and committing the patch!

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946210#comment-14946210
 ] 

zhihai xu commented on YARN-3943:
-

The checkstyle issues and release audit warnings were pre-existing.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945491#comment-14945491
 ] 

Naganarasimha G R commented on YARN-4215:
-

TestFifoScheduler is passing locally and seems like modifications over here 
will not impact this test case.

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4209:

Attachment: YARN-4209.branch-2.7.patch

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945603#comment-14945603
 ] 

Hudson commented on YARN-4215:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8577 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8577/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945480#comment-14945480
 ] 

Varun Saxena commented on YARN-4178:


Ok. But then should we rename the other one as invertLong ?

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945575#comment-14945575
 ] 

Hadoop QA commented on YARN-4209:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765223/YARN-4209.branch-2.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / 5453a63 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9362/console |


This message was automatically generated.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945475#comment-14945475
 ] 

Sangjin Lee commented on YARN-4178:
---

The latest patch looks good to me for the most part. I just had one minor point.

(TimelineStorageUtils.java)
- l.169: I might prefer to name this method a little differently. Currently, 
the right method will be picked depending on the type of the input argument. 
However, I see possibility of making a mistake and accidentally getting an 
incorrect result. How about differentiating this strongly, e.g. {{invertInt()}}?


> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945485#comment-14945485
 ] 

Sangjin Lee commented on YARN-4178:
---

That's fine too. I'm comfortable either way (invert/invertInt vs. 
invertLong/invertInt).

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: (was: YARN-3943.001.patch)

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4127:
---
Affects Version/s: 2.7.1

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: YARN-4178-YARN-2928.05.patch

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode

2015-10-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4127:
---
Component/s: resourcemanager

> RM fail with noAuth error if switched from non-failover mode to failover mode 
> --
>
> Key: YARN-4127
> URL: https://issues.apache.org/jira/browse/YARN-4127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Jian He
>Assignee: Varun Saxena
> Attachments: YARN-4127.01.patch
>
>
> The scenario is that RM failover was initially enabled, so the zkRootNodeAcl 
> is by default set with the *RM ID* in the ACL string 
> If RM failover is then switched to be disabled,  it cannot load data from ZK 
> and fail with noAuth error. After I reset the root node ACL, it again can 
> access.
> {code}
> 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050)
>   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194)
> {code}
>  the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to 
> connect with ZK and thus fail with no Auth error.
> We should be able to switch failover on and off with no interruption to the 
> user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4231) Node Label should support pluggable storage

2015-10-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4231:


 Summary: Node Label should support pluggable storage
 Key: YARN-4231
 URL: https://issues.apache.org/jira/browse/YARN-4231
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


We need to support pluggable storage for node label. Currently it requires file 
system supports append. Some filesystem doesn't support append, for example, 
Azure FileSystem. We'd better make storage implementation pluggable so 
different filesystem can choose their best approach to implement node label 
storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4230) Increasing container resource while there is no headroom left will cause ResourceManager to crash

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945574#comment-14945574
 ] 

Hadoop QA commented on YARN-4230:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 39s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  61m 55s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765197/YARN-4230.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 874c8ed |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9361/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9361/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9361/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9361/console |


This message was automatically generated.

> Increasing container resource while there is no headroom left will cause 
> ResourceManager to crash
> -
>
> Key: YARN-4230
> URL: https://issues.apache.org/jira/browse/YARN-4230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
>Priority: Critical
> Attachments: YARN-4230.1.patch
>
>
> This issue was found while doing end-to-end test of YARN-1197 in YARN-4175.
> When increasing resource of a container, if there is no headroom left for the 
> user, the ResourceManager crashes with NPE.
> The following is the stack trace:
> {code}
> 15/10/05 20:35:21 INFO capacity.ParentQueue: assignedContainer queue=root 
> usedCapacity=0.9375 absoluteUsedCapacity=0.9375 used= 
> cluster=
> 15/10/05 20:35:49 FATAL resourcemanager.ResourceManager: Error in handling 
> event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.IncreaseContainerAllocator.assignContainers(IncreaseContainerAllocator.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:819)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1274)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:134)
> at 
> 

[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.001.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4215:

Attachment: YARN-4215.v1.002.patch

Hi [~wangda]
I have updated the test cases please check

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944593#comment-14944593
 ] 

Hadoop QA commented on YARN-4162:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 17s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 53s | The applied patch generated  
30 new checkstyle issues (total was 223, now 252). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  61m 26s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765121/YARN-4162.v2.003.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5f6edb3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9356/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9356/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9356/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9356/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9356/console |


This message was automatically generated.

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4215:

Attachment: (was: YARN-4215.v1.002.patch)

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4228) FileSystemRMStateStore use IOUtils on fs#close

2015-10-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4228:
---
Attachment: 0001-YARN-4228.patch

Attaching patch for the same

> FileSystemRMStateStore use IOUtils on fs#close
> --
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944602#comment-14944602
 ] 

Hadoop QA commented on YARN-4227:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 
new checkstyle issues (total was 71, now 72). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  55m 56s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765128/YARN-4227.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5f6edb3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9357/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9357/console |


This message was automatically generated.

> FairScheduler: RM quits processing expired container from a removed node
> 
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> 

[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944608#comment-14944608
 ] 

Hadoop QA commented on YARN-4215:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 21s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  56m 21s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765130/YARN-4215.v1.002.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5f6edb3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9358/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9358/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9358/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9358/console |


This message was automatically generated.

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945776#comment-14945776
 ] 

Vrushali C commented on YARN-4178:
--

Patch v5 looks good to me too.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4230) Increasing container resource while there is no headroom left will cause ResourceManager to crash

2015-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4230:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1197

> Increasing container resource while there is no headroom left will cause 
> ResourceManager to crash
> -
>
> Key: YARN-4230
> URL: https://issues.apache.org/jira/browse/YARN-4230
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
>Priority: Critical
> Attachments: YARN-4230.1.patch
>
>
> This issue was found while doing end-to-end test of YARN-1197 in YARN-4175.
> When increasing resource of a container, if there is no headroom left for the 
> user, the ResourceManager crashes with NPE.
> The following is the stack trace:
> {code}
> 15/10/05 20:35:21 INFO capacity.ParentQueue: assignedContainer queue=root 
> usedCapacity=0.9375 absoluteUsedCapacity=0.9375 used= 
> cluster=
> 15/10/05 20:35:49 FATAL resourcemanager.ResourceManager: Error in handling 
> event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.IncreaseContainerAllocator.assignContainers(IncreaseContainerAllocator.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:819)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1274)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:134)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
> at java.lang.Thread.run(Thread.java:745)
> 15/10/05 20:35:49 INFO resourcemanager.ResourceManager: Exiting, bbye..
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945681#comment-14945681
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 13s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 39s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 29s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 33s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   2m  2s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   8m 45s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  56m  0s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765231/YARN-3943.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 29a582a |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9364/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945717#comment-14945717
 ] 

Wangda Tan commented on YARN-3216:
--

[~sunilg],
bq. In FiCaSchedulerApp's ctor, update AM-Resource-Request.resource on 
partition ( keep existing code). But use 
rmApp.getAMResourceRequest().getNodeLabelExpression() to setAMResource instead 
of setting to NO_LABEL. Because this information wont be changed later.
Make sense to me.

bq. Here AM-Resource-Request.resource is updated in FiCaSchedulerApp's ctor 
based on rmApp.getAMResourceRequest. Once container is allocated, this resource 
will be come a part of the partition with no change in resource. So I feel I 
need not have to update resource in allocate call of FicaSchedulerApp. Am I 
correct?
I think so, maybe we need to consider increase container also, but I think it's 
an edge case, we may or may not need to handle it.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: (was: YARN-3943.001.patch)

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.001.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945753#comment-14945753
 ] 

Hudson commented on YARN-4215:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1225 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1225/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945643#comment-14945643
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 38s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 18s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 17s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 32s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  42m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765222/YARN-4178-YARN-2928.05.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 09c3576 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9363/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9363/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9363/console |


This message was automatically generated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945750#comment-14945750
 ] 

Sangjin Lee commented on YARN-4178:
---

The latest patch LGTM. I'll wait a little while before I commit to give other 
folks to chime in.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945811#comment-14945811
 ] 

Hudson commented on YARN-4215:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #496 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/496/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4218) Metric for resource*time that was preempted

2015-10-06 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4218:
---
Attachment: YARN-4218.2.patch

> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.patch, 
> YARN-4218.wip.patch, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945887#comment-14945887
 ] 

Hudson commented on YARN-4215:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2431 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2431/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: YARN-3446.003.patch

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-06 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945898#comment-14945898
 ] 

Jonathan Eagles commented on YARN-4009:
---

[~vvasudev], I'm wondering one more thing. Instead of adding logic for 
disabling the log message, I was thinking about changing the logic in 
ApplicationHistoryServer. Seems like we could just modify the logic below to 
avoid double adding the cross origin filter. What do you think?

{code}
if (!initializers.contains(CrossOriginFilterInitializer.class.getName())) {
  if(conf.getBoolean(YarnConfiguration
  .TIMELINE_SERVICE_HTTP_CROSS_ORIGIN_ENABLED, YarnConfiguration
  .TIMELINE_SERVICE_HTTP_CROSS_ORIGIN_ENABLED_DEFAULT)) {
if (initializers.length() != 0) {
  initializers += ",";
}
initializers += CrossOriginFilterInitializer.class.getName();
modifiedInitializers = true;
  }
}

{code}

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: (was: YARN-3446.003.patch)

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2015-10-06 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945933#comment-14945933
 ] 

Ruslan Dautkhanov commented on YARN-462:


It's probably related to https://issues.apache.org/jira/browse/YARN-415 

> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a charge card analogy: 
> when you buy something online, you don't just say who you are and how to ship 
> it, you also specify how you're paying for it.  Similarly, when submitting an 
> application in YARN, you could explicitly specify to whom it's resource usage 
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that 
> organizations not interested in chargeback or project-level resource tracking 
> can happily continue on as if it wasn't there.  However, it should be 
> configurable at the cluster-level such that, a given cluster to could elect 
> to make it required, so that all applications would have an associated 
> project.  The value of this new parameter should be exposed via the Resource 
> Manager UI and Resource Manager REST API, so that users and tools can make 
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in 
> the ways it could be used.  It is essentially just an additional party other 
> than user or queue that an application can be associated with, so its use is 
> not just limited to a chargeback scenario.  For example, an organization not 
> interested in chargeback could still use this parameter to communicate useful 
> information about a application (e.g. pipelineX.stageN) and aggregate like 
> applications.
> Enforcement
> Couldn't users just specify this information as a prefix for their job names? 
>  Yes, but the missing piece this could provides is enforcement.  Ideally, I'd 
> like this parameter to work very much like how the queues work.  Like already 
> exists with queues, it'd be ideal if a given user couldn't just specify any 
> old value for this parameter.  It could be configurable such that a given 
> user only has permission to submit 

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils on fs#close

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944670#comment-14944670
 ] 

Hadoop QA commented on YARN-4228:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  55m 49s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765137/0001-YARN-4228.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5f6edb3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9360/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9360/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9360/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9360/console |


This message was automatically generated.

> FileSystemRMStateStore use IOUtils on fs#close
> --
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> 

[jira] [Commented] (YARN-4224) Change the REST interface to conform to current REST APIs' in YARN

2015-10-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944644#comment-14944644
 ] 

Varun Saxena commented on YARN-4224:


All these are new APIs' added as part of ATSv2. And ATSv2 is marked to go in 2.8
Moreover, v2 here indicates version 2 of ATS.

> Change the REST interface to conform to current REST APIs' in YARN
> --
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-10-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4224:
---
Summary: Change the ATSv2 reader side REST interface to conform to current 
REST APIs' in YARN  (was: Change the REST interface to conform to current REST 
APIs' in YARN)

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4228) FileSystemRMStateStore use IOUtils.close instead of fs#close

2015-10-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4228:
---
Summary: FileSystemRMStateStore use IOUtils.close instead of fs#close  
(was: FileSystemRMStateStore use IOUtils on fs#close)

> FileSystemRMStateStore use IOUtils.close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-06 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4228:
---
Summary: FileSystemRMStateStore use IOUtils#close instead of fs#close  
(was: FileSystemRMStateStore use IOUtils.close instead of fs#close)

> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945553#comment-14945553
 ] 

zhihai xu commented on YARN-4209:
-

thanks [~rohithsharma]! Yes, I attached the patch YARN-4209.branch-2.7.patch 
for branch-2.7.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945558#comment-14945558
 ] 

Wangda Tan commented on YARN-4215:
--

Patch looks good, committing...

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946165#comment-14946165
 ] 

Naganarasimha G R commented on YARN-4215:
-

Thanks [~wangda] for reviewing and committing the patch

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946038#comment-14946038
 ] 

Hudson commented on YARN-4215:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #462 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/462/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946082#comment-14946082
 ] 

Hudson commented on YARN-4215:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2401 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2401/])
YARN-4215. RMNodeLabels Manager Need to verify and replace node labels (wangda: 
rev 29a582ada0fe195989eca25e5a995895e178f4ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java


> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Fix For: 2.8.0
>
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946090#comment-14946090
 ] 

Hadoop QA commented on YARN-4218:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 22s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 43s | The applied patch generated  3 
new checkstyle issues (total was 10, now 13). |
| {color:red}-1{color} | whitespace |   0m 19s | The patch has 10  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 59s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 17s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  17m 29s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  85m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestApplicationCleanup 
|
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | 
hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | 

[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946117#comment-14946117
 ] 

Hadoop QA commented on YARN-3446:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   9m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 48s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 56s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 41s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  57m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 103m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765274/YARN-3446.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6d5713a |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9367/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9367/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9367/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9367/console |


This message was automatically generated.

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-462) Project Parameter for Chargeback

2015-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945950#comment-14945950
 ] 

Jason Lowe commented on YARN-462:
-

It is related to YARN-415 but not implemented by it.  YARN-415 adds the ability 
to know the aggregate resource usage of an application, but it doesn't help 
associate that application with a project or business entity.  To use Kendall's 
analogy from above, YARN-415 lets us know how much an app costs but not who's 
supposed to be paying for it.  One would think that would be the user, but in 
many cluster setups a single user can run jobs for multiple projects.


> Project Parameter for Chargeback
> 
>
> Key: YARN-462
> URL: https://issues.apache.org/jira/browse/YARN-462
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>
> Problem Summary
> For the purpose of chargeback and better understanding of grid usage, we need 
> to be able to associate applications with "projects", e.g. "pipeline X", 
> "property Y".  This would allow us to aggregate on this property, thereby 
> helping us compute grid resource usage for the entire "project".  Currently, 
> for a given application, two things we know about it are the user that 
> submitted it and the queue it was submitted to.  Below, I'll explain why 
> neither of these is adequate for enterprise-level chargeback and 
> understanding resource allocation needs.
> Why Not Users?
> Its not individual users that are paying the bill -- its projects.  When one 
> of our real users submits an application on a Hadoop grid, they're presumably 
> not usually doing it for themselves.  They're doing work for some project or 
> team effort, so its that team or project that should be "charged" for all its 
> users applications.  Maintaining outside lists of associations between users 
> and projects is error-prone because it is time-sensitive and requires 
> continued ongoing maintenance.  New users join organizations, users leave and 
> users even change projects.  Furthermore, users may split their time between 
> multiple projects, making it ambiguous as to which of a user's projects a 
> given application should be charged.  Also, there can be headless users, 
> which can be even more difficult to link to a project and can be shared 
> between teams or projects.
> Why Not Queues?
> The purpose of queues is for scheduling.  Overloading the queues concept to 
> also mean who should be "charged" for an application can have a detrimental 
> effect on the primary purpose of queues.  It could be manageable in the case 
> of a very small number of projects sharing a cluster, but doesn't scale to 
> tens or hundreds of projects sharing a cluster.  If a given cluster is shared 
> between 50 projects, creating 50 separate queues will result in inefficient 
> use of the cluster resources.  Furthermore, a given project may desire more 
> than one queue for different types or priorities of applications.  
> Proposed Solution
> Rather than relying on external tools to infer through the user and/or queue 
> who to "charge" for a given application, I propose a straightforward approach 
> where that information be explicitly supplied when the application is 
> submitted, just like we do with queues.  Let's use a charge card analogy: 
> when you buy something online, you don't just say who you are and how to ship 
> it, you also specify how you're paying for it.  Similarly, when submitting an 
> application in YARN, you could explicitly specify to whom it's resource usage 
> should be associated (a project, team, cost center, etc).
> This new configuration parameter should default to being optional, so that 
> organizations not interested in chargeback or project-level resource tracking 
> can happily continue on as if it wasn't there.  However, it should be 
> configurable at the cluster-level such that, a given cluster to could elect 
> to make it required, so that all applications would have an associated 
> project.  The value of this new parameter should be exposed via the Resource 
> Manager UI and Resource Manager REST API, so that users and tools can make 
> use of it for chargeback, utilization metrics, etc.
> I'm undecided on what to name the new parameter, as I like the flexibility in 
> the ways it could be used.  It is essentially just an additional party other 
> than user or queue that an application can be associated with, so its use is 
> not just limited to a chargeback scenario.  For example, an organization not 
> interested in chargeback could still use this parameter to communicate useful 
> information about a application (e.g. pipelineX.stageN) and aggregate like 
> applications.
> Enforcement
> Couldn't users just specify this information as a prefix for 

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945954#comment-14945954
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 44s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 27s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   8m 45s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  58m 48s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765248/YARN-3943.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6d5713a |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9365/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-10-06 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945988#comment-14945988
 ] 

MENG DING commented on YARN-4175:
-

Update on my testing result.

Based on my tests of this feature against a 4 node cluster using the modified 
distributed shell app, the only critical issue I found is an NPE issue of 
resourcemanager when there is not enough headroom. The issue has been logged in 
YARN-4230. The only other minor issue I can think of is that some logging 
information can be improved, for which I will log a separate (low priority) 
issue.

The tests I performed so far include:
* Verify container resource increase/decrease when there are resources 
available, and no limits are exceeded. Verify container sizes are reported 
correctly on Web UI.
* Verify container resource increase reservation when host doesn't have enough 
resource for the additional allocation. Verify resource reservation information 
on Web UI (Memory Reserved, Lasts Reservation, etc)
* Verify that while an increase reservation is in place on a host, regular and 
increase allocation requests from other application will be skipped on this 
host.
* Verify that an increase reservation will be fulfilled when enough resource is 
freed up on the host.
* Verify that while increase reservation is in place for a container, a 
decrease request to the same container (with target resource <= original 
resource) will cancel the reservation.
* Verify that pending resource increase request will not be processed when 
there is no headroom left (after applying patch from YARN-4230).
* Verify that invalid resource increase/decrease request will throw exception 
in AMRMClient and distributed shell application master onError callback handler 
will be called.
* Verify that resource monitoring is changed on NM after container 
increase/decrease is completed.
* Verify that killing and restarting NM will recover increased/decreased 
containers if NM work preserving restart is enabled.
* All tests are verified using both DefaultResourceCalculator and 
DominantResourceCalculator.

Let me know if you have any comments or suggestions.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-4175.1.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946015#comment-14946015
 ] 

Wangda Tan commented on YARN-4175:
--

Thanks [~mding] a lot! As an end-to-end test for basic functionalities, I think 
this should be enough, more tests could be done when downstream projects 
support container resizing, such as Spark/Slider.

Will review pending patches you mentioned shortly.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-4175.1.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-10-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946018#comment-14946018
 ] 

Sangjin Lee commented on YARN-3798:
---

I think you can name your patch {{YARN-3798-branch-2.6.001.patch}} for it to be 
tested against branch-2.6.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>  

[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946020#comment-14946020
 ] 

Wangda Tan commented on YARN-1509:
--

Thanks [~mding], I think patch looks good, and your response makes sense to me. 
One nit is: could you wrap Log.debug with Log.isDebugEnabled? Will commit once 
Jenkins get back.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945020#comment-14945020
 ] 

Jason Lowe commented on YARN-4216:
--

If we're decommissioning a node then we're not doing a rolling upgrade of it.  
Decomm of a node should kill all of the containers on the node, upload the 
logs, then shutdown the node.  That's not a rolling upgrade since we lose work. 
 It may be rolling in the sense that we can go through the nodes in a serial 
fashion, but since work is being lost at each step it's significantly different 
than the rolling upgrade with work-preserving restart.

What we're talking about here is reinsertion of a previously decomm'd node that 
ends up running containers for an application that already had logs aggregated 
which is slightly different than the JIRA title which implies work-preserving 
restart.  Having the NM append the new logs would be a reasonable approach to 
try to avoid log loss, although there's the problem of active readers for the 
logs.  If we're appending then we can end up with partially written logs at the 
end when readers come along to parse the logs.  We'd either have to live with 
that possibility or have the NM copy the existing logs to the .tmp file before 
appending the new logs then atomically replacing the previous logs with the new 
version.  Not all filesystems support atomic replace, but HDFS can do it.

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4230) Increasing container resource while there is no headroom left will cause ResourceManager to crash

2015-10-06 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4230:

Attachment: YARN-4230.1.patch

The fix is simple. Attaching the patch with an added test case.

> Increasing container resource while there is no headroom left will cause 
> ResourceManager to crash
> -
>
> Key: YARN-4230
> URL: https://issues.apache.org/jira/browse/YARN-4230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
>Priority: Critical
> Attachments: YARN-4230.1.patch
>
>
> This issue was found while doing end-to-end test of YARN-1197 in YARN-4175.
> When increasing resource of a container, if there is no headroom left for the 
> user, the ResourceManager crashes with NPE.
> The following is the stack trace:
> {code}
> 15/10/05 20:35:21 INFO capacity.ParentQueue: assignedContainer queue=root 
> usedCapacity=0.9375 absoluteUsedCapacity=0.9375 used= 
> cluster=
> 15/10/05 20:35:49 FATAL resourcemanager.ResourceManager: Error in handling 
> event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.IncreaseContainerAllocator.assignContainers(IncreaseContainerAllocator.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:819)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1274)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:134)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
> at java.lang.Thread.run(Thread.java:745)
> 15/10/05 20:35:49 INFO resourcemanager.ResourceManager: Exiting, bbye..
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4229) Support max-am-resource-percentage per label partition for User

2015-10-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4229:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

> Support max-am-resource-percentage per label partition for User
> ---
>
> Key: YARN-4229
> URL: https://issues.apache.org/jira/browse/YARN-4229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>
> Similar to YARN-3216, we need to support per-user-per-partition level max AM 
> resource percentage also. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to kill AM attempts

2015-10-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945258#comment-14945258
 ] 

Rohith Sharma K S commented on YARN-261:


Thanks [~jlowe] for sharing your thoughts.
While rebasing the patch, I had look at the code for KILL too and I find both 
can be supported with minimal change. If more code base thena subtask can be 
created at server and client side changes. As a first step I will implement 
prototype for supporting KILL and do test. For the current patch, I will add 
more functionality tests for making regression stronger.

> Ability to kill AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, 
> YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, 
> YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4230) Increasing container resource while there is no headroom left will cause ResourceManager to crash

2015-10-06 Thread MENG DING (JIRA)
MENG DING created YARN-4230:
---

 Summary: Increasing container resource while there is no headroom 
left will cause ResourceManager to crash
 Key: YARN-4230
 URL: https://issues.apache.org/jira/browse/YARN-4230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: MENG DING
Assignee: MENG DING
Priority: Critical


This issue was found while doing end-to-end test of YARN-1197 in YARN-4175.

When increasing resource of a container, if there is no headroom left for the 
user, the ResourceManager crashes with NPE.

The following is the stack trace:

{code}
15/10/05 20:35:21 INFO capacity.ParentQueue: assignedContainer queue=root 
usedCapacity=0.9375 absoluteUsedCapacity=0.9375 used= 
cluster=
15/10/05 20:35:49 FATAL resourcemanager.ResourceManager: Error in handling 
event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.IncreaseContainerAllocator.assignContainers(IncreaseContainerAllocator.java:327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:66)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:474)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:819)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:572)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:423)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1177)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1274)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:134)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:691)
at java.lang.Thread.run(Thread.java:745)
15/10/05 20:35:49 INFO resourcemanager.ResourceManager: Exiting, bbye..
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4229) Support max-am-resource-percentage per label partition for User

2015-10-06 Thread Sunil G (JIRA)
Sunil G created YARN-4229:
-

 Summary: Support max-am-resource-percentage per label partition 
for User
 Key: YARN-4229
 URL: https://issues.apache.org/jira/browse/YARN-4229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Sunil G
Assignee: Sunil G


Similar to YARN-3216, we need to support per-user-per-partition level max AM 
resource percentage also. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-10-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945318#comment-14945318
 ] 

Rohith Sharma K S commented on YARN-4000:
-

[~varun_saxena] I see many changes related to diagnosis messages are 
incorporated in this patch. YARN-4111 is inteded JIRA for adding diagnosis 
message for kill transition. If required, YARN-4111 can be generalized to do 
the changes which is done as part of current patch. 
Would you mind providing patch related to  diagnosis message change at 
YARN-4111? As first, let YARN-4111 go in first. 

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)