[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537625#comment-14537625
 ] 

lachisis commented on YARN-3614:


Yes, it is ok to check the existence of the directory first.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537628#comment-14537628
 ] 

lachisis commented on YARN-3614:


Yes, it is ok to check the existence of the directory first.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-05-11 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537681#comment-14537681
 ] 

Dian Fu commented on YARN-3557:
---

Hi [~leftnoteasy],
I have posted the requirements about supporting configure constraints node 
label from both RM and NM on YARN-3409. About support script based node label 
configuration at RM side, what's your thought?

 Support Intel Trusted Execution Technology(TXT) in YARN scheduler
 -

 Key: YARN-3557
 URL: https://issues.apache.org/jira/browse/YARN-3557
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Dian Fu
 Attachments: Support TXT in YARN high level design doc.pdf


 Intel TXT defines platform-level enhancements that provide the building 
 blocks for creating trusted platforms. A TXT aware YARN scheduler can 
 schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
 provides the capacity to restrict YARN applications to run only on cluster 
 nodes that have a specified node label. This is a good mechanism that be 
 utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537624#comment-14537624
 ] 

lachisis commented on YARN-3614:


Yes, it is ok to check the existence of the directory first.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537626#comment-14537626
 ] 

lachisis commented on YARN-3614:


Yes, it is ok to check the existence of the directory first.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537662#comment-14537662
 ] 

Brahma Reddy Battula commented on YARN-3614:


{quote} when standby resourcemanager try to transitiontoActive, it will cost 
more than ten minutes to load applications{quote}
did you dig into this one, like why it's took 10mins..? Thanks

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 

[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537640#comment-14537640
 ] 

lachisis commented on YARN-3614:


I used HA of yarn for stable service. 
Months later, I find when standby resourcemanager try to transitiontoActiver, 
it will cost more than ten minutes to load applications. So I backup the 
rmstore in hdfs and change the configure 
yarn.resourcemanager.state-store.max-completed-applications to limit 
applications number in rmstroe. And find it work well when transition.
Later my partner restore backuped rmstore, and submitted a new application, 
then find resoucemanager cashed.

I know restoring backuped rmstore when resourcemanager running is not suitable. 
But this also means the processing logic of FileSystemRMStateStore is weak a 
liitle. So I suggest a little change here.
 



 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 

[jira] [Commented] (YARN-3409) Add constraint node labels

2015-05-11 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537676#comment-14537676
 ] 

Dian Fu commented on YARN-3409:
---

Just to post requirements discussed in YARN-3557 here: Constraint node labels 
should be supported to be added from both RM and NM. As some labels such as 
TRUSTED/UNTRUSTED described in YARN-3557 require to be added from RM and some 
labels such as GPU, FPGA, LINUX, WINDOWS are more suitable to be added from NM. 
A large cluster may have all these kinds of labels coexist.

 Add constraint node labels
 --

 Key: YARN-3409
 URL: https://issues.apache.org/jira/browse/YARN-3409
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, client
Reporter: Wangda Tan
Assignee: Wangda Tan

 Specify only one label for each node (IAW, partition a cluster) is a way to 
 determinate how resources of a special set of nodes could be shared by a 
 group of entities (like teams, departments, etc.). Partitions of a cluster 
 has following characteristics:
 - Cluster divided to several disjoint sub clusters.
 - ACL/priority can apply on partition (Only market team / marke team has 
 priority to use the partition).
 - Percentage of capacities can apply on partition (Market team has 40% 
 minimum capacity and Dev team has 60% of minimum capacity of the partition).
 Constraints are orthogonal to partition, they’re describing attributes of 
 node’s hardware/software just for affinity. Some example of constraints:
 - glibc version
 - JDK version
 - Type of CPU (x86_64/i686)
 - Type of OS (windows, linux, etc.)
 With this, application can be able to ask for resource has (glibc.version = 
 2.20  JDK.version = 8u20  x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537632#comment-14537632
 ] 

lachisis commented on YARN-3614:


Sorry, terrible network.  How can i delete the repeated replys.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537645#comment-14537645
 ] 

lachisis commented on YARN-3614:


Thanks for the chance to provide the patch.
I will submit the patch later.

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 

[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-05-11 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537726#comment-14537726
 ] 

Tsuyoshi Ozawa commented on YARN-3170:
--

[~brahmareddy] thank you for updating.

{quote} 
We call MapReduce running on YARN MapReduce 2.0 (MRv2).
{quote}

A trailing double quotation is missing. Please add it before the period. 

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170.patch


 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-817) If input path does not exist application/job id is getting assigned.

2015-05-11 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537817#comment-14537817
 ] 

Rohith commented on YARN-817:
-

Input path is used by Application JVM. Application client should handle this 
before submiting the application to YARN. 
Closing as Invalid, reopen if any concern on this

 If input path does not exist application/job id is getting assigned.
 

 Key: YARN-817
 URL: https://issues.apache.org/jira/browse/YARN-817
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Nishan Shetty
Priority: Minor

 1.Run job by giving input as some path which does not exist
 2.Application/job is is getting assigned.
 2013-06-12 16:00:24,494 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
 applicationId: 12
 Suggestion
 Before assiging job/app id input path check can be made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-817) If input path does not exist application/job id is getting assigned.

2015-05-11 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith resolved YARN-817.
-
Resolution: Invalid

 If input path does not exist application/job id is getting assigned.
 

 Key: YARN-817
 URL: https://issues.apache.org/jira/browse/YARN-817
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Nishan Shetty
Priority: Minor

 1.Run job by giving input as some path which does not exist
 2.Application/job is is getting assigned.
 2013-06-12 16:00:24,494 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
 applicationId: 12
 Suggestion
 Before assiging job/app id input path check can be made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3170) YARN architecture document needs updating

2015-05-11 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3170:
---
Attachment: YARN-3170-004.patch

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
 YARN-3170-004.patch, YARN-3170.patch


 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537780#comment-14537780
 ] 

Hadoop QA commented on YARN-3170:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   2m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 57s | Site still builds. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   6m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731885/YARN-3170-004.patch |
| Optional Tests | site |
| git revision | trunk / 3fa2efc |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7860/console |


This message was automatically generated.

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
 YARN-3170-004.patch, YARN-3170.patch


 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-05-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537772#comment-14537772
 ] 

Brahma Reddy Battula commented on YARN-3170:


[~ozawa] updated the patch..Kindly Review..thanks

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
 YARN-3170-004.patch, YARN-3170.patch


 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3615) Yarn and Mapred queue CLI command support for Fairscheduler

2015-05-11 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3615:
--

 Summary: Yarn and Mapred queue CLI command support for 
Fairscheduler
 Key: YARN-3615
 URL: https://issues.apache.org/jira/browse/YARN-3615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R


Add support for CLI command when Fair scheduler is configured
Listed few command which needs updation

./yarn queue -status job-queue-name

*Current output*
{code}
Queue Name : root.sls_queue_2
State : RUNNING
Capacity : 100.0%
Current Capacity : 100.0%
Maximum Capacity : -100.0%
Default Node Label expression :
Accessible Node Labels :

{code}
./mapred queue -info job-queue-name 
./mapred queue  -list

All the below commands currently displaying based on Capacity 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers

2015-05-11 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3513:

Attachment: YARN-3513.20150511-1.patch

Ok [~devaraj.k], updated the patch as per your suggestion.

 Remove unused variables in ContainersMonitorImpl and add debug log for 
 overall resource usage by all containers 
 

 Key: YARN-3513
 URL: https://issues.apache.org/jira/browse/YARN-3513
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Trivial
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, 
 YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, 
 YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, 
 YARN-3513.20150511-1.patch


 Some local variables in MonitoringThread.run()  : {{vmemStillInUsage and 
 pmemStillInUsage}} are not used and just updated. 
 Instead we need to add debug log for overall resource usage by all containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537869#comment-14537869
 ] 

Hudson commented on YARN-3587:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7790 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7790/])
YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. 
projects. Contributed by Gabor Liptak. (junping_du: rev 
7e543c27fa2881aa65967be384a6203bd5b2304f)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java


 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3596) Fix the javadoc of DelegationTokenSecretManager in hadoop-common

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3596.
--
Resolution: Duplicate

 Fix the javadoc of DelegationTokenSecretManager in hadoop-common
 

 Key: YARN-3596
 URL: https://issues.apache.org/jira/browse/YARN-3596
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gabor Liptak
Priority: Trivial
 Attachments: YARN-3596.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3597) Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3597.
--
Resolution: Duplicate

 Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs
 --

 Key: YARN-3597
 URL: https://issues.apache.org/jira/browse/YARN-3597
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gabor Liptak
Priority: Trivial
 Attachments: YARN-3597.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3587:
-
Hadoop Flags: Reviewed

 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3598) Fix the javadoc of DelegationTokenSecretManager in hadoop-mapreduce

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3598.
--
Resolution: Duplicate

 Fix the javadoc of DelegationTokenSecretManager in hadoop-mapreduce
 ---

 Key: YARN-3598
 URL: https://issues.apache.org/jira/browse/YARN-3598
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gabor Liptak
Priority: Trivial
 Attachments: YARN-3598.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537911#comment-14537911
 ] 

Junping Du commented on YARN-3276:
--

Thanks [~zjshen] for review and comments!
bq. TimelineServiceUtils - TimelineServiceHelper?
Sure. Will update it.

bq.  Is mapreduce using it? Maybe simply @Private
In my understanding, @Private could means it could be used by Common, HDFS, 
MapReduce, and YARN, so it could be broader than current limitation? I 
didn't remove MapReduce here as from other places, it seems we always keep 
MapReduce there as a practice even no obviously reference from MR project. May 
be better to keep here as it is?

bq. TimelineEvent are not covered?
Nice catch! Will update it.

bq. AllocateResponsePBImpl change is not related?
Yes. There are several findbug warnings (this and change in 
TimelineMetric.java) involved in previous patch on branch YARN-2928. I think it 
could be too overkill to file a separated JIRA to fix this simple issues so I 
put the fix here and update the title a little bit. Make sense?

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, 
 YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3276:
-
Attachment: YARN-3276-YARN-2928.v5.patch

Fix most comments from [~zjshen] in v5 patch.

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5.patch, 
 YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537832#comment-14537832
 ] 

Hadoop QA commented on YARN-3513:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 36s | The applied patch generated  1 
new checkstyle issues (total was 27, now 27). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  1s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 57s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  41m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731901/YARN-3513.20150511-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3fa2efc |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7861/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7861/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7861/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7861/console |


This message was automatically generated.

 Remove unused variables in ContainersMonitorImpl and add debug log for 
 overall resource usage by all containers 
 

 Key: YARN-3513
 URL: https://issues.apache.org/jira/browse/YARN-3513
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Trivial
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, 
 YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, 
 YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, 
 YARN-3513.20150511-1.patch


 Some local variables in MonitoringThread.run()  : {{vmemStillInUsage and 
 pmemStillInUsage}} are not used and just updated. 
 Instead we need to add debug log for overall resource usage by all containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3409) Add constraint node labels

2015-05-11 Thread David Villegas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537858#comment-14537858
 ] 

David Villegas commented on YARN-3409:
--

Thanks for your comment, Wangda. 

I agree that loadAvg may not be useful in all cases. The main idea for dynamic 
label values is the system would be more extensible, and reduce human errors if 
some of the labels can be automatically populated. An example that comes to 
mind, based on Dian's comment, is the NodeManager's Operating System. Rather 
than having an administrator set it, it could be pre-set to the actual OS by 
the NM.

 Add constraint node labels
 --

 Key: YARN-3409
 URL: https://issues.apache.org/jira/browse/YARN-3409
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, client
Reporter: Wangda Tan
Assignee: Wangda Tan

 Specify only one label for each node (IAW, partition a cluster) is a way to 
 determinate how resources of a special set of nodes could be shared by a 
 group of entities (like teams, departments, etc.). Partitions of a cluster 
 has following characteristics:
 - Cluster divided to several disjoint sub clusters.
 - ACL/priority can apply on partition (Only market team / marke team has 
 priority to use the partition).
 - Percentage of capacities can apply on partition (Market team has 40% 
 minimum capacity and Dev team has 60% of minimum capacity of the partition).
 Constraints are orthogonal to partition, they’re describing attributes of 
 node’s hardware/software just for affinity. Some example of constraints:
 - glibc version
 - JDK version
 - Type of CPU (x86_64/i686)
 - Type of OS (windows, linux, etc.)
 With this, application can be able to ask for resource has (glibc.version = 
 2.20  JDK.version = 8u20  x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3587:
-
Summary: Fix the javadoc of DelegationTokenSecretManager in projects of 
yarn, etc.  (was: Fix the javadoc of DelegationTokenSecretManager in yarn 
project)

 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-11 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537837#comment-14537837
 ] 

nijel commented on YARN-3614:
-

hi @lachisis
bq.when standby resourcemanager try to transitiontoActive, it will cost more 
than ten minutes to load applications
Is this a secure cluster ? 

 FileSystemRMStateStore throw exception when failed to remove application, 
 that cause resourcemanager to crash
 -

 Key: YARN-3614
 URL: https://issues.apache.org/jira/browse/YARN-3614
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
 When it failed to remove application, I think warning is enough, but now 
 resourcemanager crashed.
 Recently, I configure 
 yarn.resourcemanager.state-store.max-completed-applications  to limit 
 applications number in rmstore. when applications number exceed the limit, 
 some old applications will be removed. If failed to remove, resourcemanager 
 will crash.
 The following is log: 
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
 info for app: application_1430994493305_0053
 2015-05-11 06:58:43,815 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
  Removing info for app: application_1430994493305_0053 at: 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 2015-05-11 06:58:43,816 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 removing app: application_1430994493305_0053
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-11 06:58:43,819 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 java.lang.Exception: Failed to delete 
 /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 

[jira] [Resolved] (YARN-3599) Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3599.
--
Resolution: Duplicate

 Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn
 --

 Key: YARN-3599
 URL: https://issues.apache.org/jira/browse/YARN-3599
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gabor Liptak
Priority: Trivial
 Attachments: YARN-3599.1.patch, YARN-3599.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-401) ClientRMService.getQueueInfo can return stale application reports

2015-05-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-401.
-
Resolution: Duplicate

This was fixed by YARN-2978.

 ClientRMService.getQueueInfo can return stale application reports
 -

 Key: YARN-401
 URL: https://issues.apache.org/jira/browse/YARN-401
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.6
Reporter: Jason Lowe
Priority: Minor

 ClientRMService.getQueueInfo is modifying a QueueInfo object when application 
 reports are requested.  Unfortunately this QueueInfo object could be a 
 persisting object in the scheduler, and modifying it in this way can lead to 
 stale application reports being returned to the client.  Here's an example 
 scenario with CapacityScheduler:
 # A client asks for queue info on queue X with application reports
 # ClientRMService.getQueueInfo modifies the queue's QueueInfo object and sets 
 application reports on it
 # Another client asks for recursive queue info from the root queue without 
 application reports
 # Since the old application reports are still attached to queue X's QueueInfo 
 object, these stale reports appear in the QueueInfo data for queue X in the 
 results
 Normally if the client is not asking for application reports it won't be 
 looking for and act upon any application reports that happen to appear in the 
 queue info result.  However we shouldn't be returning application reports in 
 the first place, and when we do, they shouldn't be stale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537962#comment-14537962
 ] 

Billie Rinaldi commented on YARN-3422:
--

That's true, changing the name to indicate direction would also be helpful.  I 
think that fixing this limitation would complicate the write path significantly 
and is probably not worthwhile in ATS v1.  If someone were to implement it, we 
would need to take before and after performance measurements and possibly make 
the new feature optional.

 relatedentities always return empty list when primary filter is set
 ---

 Key: YARN-3422
 URL: https://issues.apache.org/jira/browse/YARN-3422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3422.1.patch


 When you curl for ats entities with a primary filter, the relatedentities 
 fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538017#comment-14538017
 ] 

Akira AJISAKA commented on YARN-3587:
-

Agree with [~djp]. Late +1 from me. Thanks [~djp], [~jianhe], and [~gliptak] 
for contribution!

 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538030#comment-14538030
 ] 

Junping Du commented on YARN-3587:
--

Thanks [~ajisakaa]! :)

 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538034#comment-14538034
 ] 

Hudson commented on YARN-3587:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #192 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/192/])
YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. 
projects. Contributed by Gabor Liptak. (junping_du: rev 
7e543c27fa2881aa65967be384a6203bd5b2304f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java


 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537978#comment-14537978
 ] 

Hadoop QA commented on YARN-3276:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 36s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 20s | The applied patch generated  2 
new checkstyle issues (total was 105, now 107). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 50s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  47m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731922/YARN-3276-YARN-2928.v5.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b3b791b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7862/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7862/console |


This message was automatically generated.

 Refactor and fix null casting in some map cast for TimelineEntity (old and 
 new) and fix findbug warnings
 

 Key: YARN-3276
 URL: https://issues.apache.org/jira/browse/YARN-3276
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3276-YARN-2928.v3.patch, 
 YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5.patch, 
 YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch


 Per discussion in YARN-3087, we need to refactor some similar logic to cast 
 map to hashmap and get rid of NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3360) Add JMX metrics to TimelineDataManager

2015-05-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3360:
-
Attachment: YARN-3360.002.patch

Updated patch to trunk.

 Add JMX metrics to TimelineDataManager
 --

 Key: YARN-3360
 URL: https://issues.apache.org/jira/browse/YARN-3360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-3360.001.patch, YARN-3360.002.patch


 The TimelineDataManager currently has no metrics, outside of the standard JVM 
 metrics.  It would be very useful to at least log basic counts of method 
 calls, time spent in those calls, and number of entities/events involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538046#comment-14538046
 ] 

Hudson commented on YARN-3587:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2140 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2140/])
YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. 
projects. Contributed by Gabor Liptak. (junping_du: rev 
7e543c27fa2881aa65967be384a6203bd5b2304f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java


 Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
 -

 Key: YARN-3587
 URL: https://issues.apache.org/jira/browse/YARN-3587
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Gabor Liptak
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3587.1.patch, YARN-3587.patch


 In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager,  
 the javadoc of the constructor is as follows:
 {code}
   /**
* Create a secret manager
* @param delegationKeyUpdateInterval the number of seconds for rolling new
*secret keys.
* @param delegationTokenMaxLifetime the maximum lifetime of the delegation
*tokens
* @param delegationTokenRenewInterval how often the tokens must be renewed
* @param delegationTokenRemoverScanInterval how often the tokens are 
 scanned
*for expired tokens
*/
 {code}
 1. the number of seconds should be the number of milliseconds.
 2. It's better to add time unit to the description of other parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538058#comment-14538058
 ] 

Junping Du commented on YARN-3044:
--

Sorry for coming late on this. Latest patch LGTM too. [~sjlee0], feel free to 
go ahead to commit this!
However, for [~vinodkv]'s comments  We can take a dual pronged approach here? 
That or we make the RM-publisher itself a distributed push. which sounds 
reasonable to me but haven't fully addressed in this JIRA. Shall we open a new 
JIRA for further discussion on this?

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, 
 YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538078#comment-14538078
 ] 

Hadoop QA commented on YARN-3360:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 28s | The applied patch generated  
19 new checkstyle issues (total was 7, now 26). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 47s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   3m  8s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  38m 45s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731939/YARN-3360.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7e543c2 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7863/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7863/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7863/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7863/console |


This message was automatically generated.

 Add JMX metrics to TimelineDataManager
 --

 Key: YARN-3360
 URL: https://issues.apache.org/jira/browse/YARN-3360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-3360.001.patch, YARN-3360.002.patch


 The TimelineDataManager currently has no metrics, outside of the standard JVM 
 metrics.  It would be very useful to at least log basic counts of method 
 calls, time spent in those calls, and number of entities/events involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538367#comment-14538367
 ] 

Wangda Tan commented on YARN-3434:
--

Thanks Allen! Trying it.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.8.0

 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-05-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538242#comment-14538242
 ] 

Brahma Reddy Battula commented on YARN-3170:


[~aw] Thanks for taking look into this issue.. Updated the patch based on your 
comments..Kindly review...Let me anyother rework in second paragraph ( Mainly 
first line )...

 YARN architecture document needs updating
 -

 Key: YARN-3170
 URL: https://issues.apache.org/jira/browse/YARN-3170
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
 YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170.patch


 The marketing paragraph at the top, NextGen MapReduce, etc are all 
 marketing rather than actual descriptions. It also needs some general 
 updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3595) Performance optimization using connection cache of Phoenix timeline writer

2015-05-11 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538341#comment-14538341
 ] 

Li Lu commented on YARN-3595:
-

Hi [~sjlee0], thanks for the suggestions. I think you're right that most 
complexities come from having a cache rather than a pool for those connections. 
I'll look into alternative solutions. 

 Performance optimization using connection cache of Phoenix timeline writer
 --

 Key: YARN-3595
 URL: https://issues.apache.org/jira/browse/YARN-3595
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu

 The story about the connection cache in Phoenix timeline storage is a little 
 bit long. In YARN-3033 we planned to have shared writer layer for all 
 collectors in the same collector manager. In this way we can better reuse the 
 same heavy-weight storage layer connection, therefore it's more friendly to 
 conventional storage layer connections which are typically heavy-weight. 
 Phoenix, on the other hand, implements its own connection interface layer to 
 be light-weight, thread-unsafe. To make these connections work with our 
 multiple collector, single writer model, we're adding a thread indexed 
 connection cache. However, many performance critical factors are yet to be 
 tested. 
 In this JIRA we're tracing performance optimization efforts using this 
 connection cache. Previously we had a draft, but there was one implementation 
 challenge on cache evictions: There may be races between Guava cache's 
 removal listener calls (which close the connection) and normal references to 
 the connection. We need to carefully define the way they synchronize. 
 Performance-wise, at the very beginning stage we may need to understand:
 # If the current, thread-based indexing is an appropriate approach, or we can 
 use some better ways to index the connections. 
 # the best size of the cache, presumably as the proposed default value of a 
 configuration. 
 # how long we need to preserve a connection in the cache. 
 Please feel free to add this list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538466#comment-14538466
 ] 

Vinod Kumar Vavilapalli commented on YARN-3134:
---

Tx folks, this is great progress!

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3616) determine how to generate YARN container events

2015-05-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538303#comment-14538303
 ] 

Naganarasimha G R commented on YARN-3616:
-

I would like to continue working on this issue :).
Also to capture one important point from [~Vinodkv]'s review
bq. The missing dots occur when a container's life-cycle ends either on the RM 
or the AM. We can take a dual pronged approach here? That or we make the 
RM-publisher itself a distributed push.
IMO dual pronged approach would be better, we can rely on NMs to post normal 
life cycle events and in rare cases where NM cant handle, RM publish events 
directly to ATS.
And might be here distributed push might not work as in the cases which Vinod 
mentioned NM might not be able to handle publishing as TimelineCollector might 
not be created as no container is created in the NM side for that app. Correct 
me if i am wrong.

 determine how to generate YARN container events
 ---

 Key: YARN-3616
 URL: https://issues.apache.org/jira/browse/YARN-3616
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 The initial design called for the node manager to write YARN container events 
 to take advantage of the distributed writes. RM acting as a sole writer of 
 all YARN container events would have significant scalability problems.
 Still, there are some types of events that are not captured by the NM. The 
 current implementation has both: RM writing container events and NM writing 
 container events.
 We need to sort this out, and decide how we can write all needed container 
 events in a scalable manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-05-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538346#comment-14538346
 ] 

Allen Wittenauer commented on YARN-3434:


You can run test-patch.sh locally and specify the branch using --branch.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.8.0

 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread Georg Berendt (JIRA)
Georg Berendt created YARN-3617:
---

 Summary: Fix unused variable to get CPU frequency on Windows 
systems
 Key: YARN-3617
 URL: https://issues.apache.org/jira/browse/YARN-3617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Priority: Minor


In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there 
is an unused variable for CPU frequency.

 /** {@inheritDoc} */
  @Override
  public long getCpuFrequency() {
refreshIfNeeded();
return -1;   
  }

Please change '-1' to use 'cpuFrequencyKhz'.

org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread Georg Berendt (JIRA)
Georg Berendt created YARN-3618:
---

 Summary: Fix unused variable to get CPU frequency on Windows 
systems
 Key: YARN-3618
 URL: https://issues.apache.org/jira/browse/YARN-3618
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Priority: Minor


In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there 
is an unused variable for CPU frequency.

 /** {@inheritDoc} */
  @Override
  public long getCpuFrequency() {
refreshIfNeeded();
return -1;   
  }

Please change '-1' to use 'cpuFrequencyKhz'.

org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3621) FairScheduler doesn't count AM vcores towards max-share

2015-05-11 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3621:
--

 Summary: FairScheduler doesn't count AM vcores towards max-share
 Key: YARN-3621
 URL: https://issues.apache.org/jira/browse/YARN-3621
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: Karthik Kambatla


FairScheduler seems to not count AM vcores towards max-vcores. On a queue with 
maxVcores set to 1, I am able to run a sleep job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3622) Enable application client to communicate with new timeline service

2015-05-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3622:
-

 Summary: Enable application client to communicate with new 
timeline service
 Key: YARN-3622
 URL: https://issues.apache.org/jira/browse/YARN-3622
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


YARN application has client and AM. We have the story to make TimelineClient 
work inside AM for v2, but not for client. TimelineClient inside app client 
needs to be taken care of too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3505:

Attachment: YARN-3505.4.patch

 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets

2015-05-11 Thread Mit Desai (JIRA)
Mit Desai created YARN-3624:
---

 Summary: ApplicationHistoryServer reverses the order of the 
filters it gets
 Key: YARN-3624
 URL: https://issues.apache.org/jira/browse/YARN-3624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai


AppliactionHistoryServer should not alter the order in which it gets the filter 
chain. Additional filters should be added at the end of the chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets

2015-05-11 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-3624:

Attachment: YARN-3624.patch

attaching the patch

 ApplicationHistoryServer reverses the order of the filters it gets
 --

 Key: YARN-3624
 URL: https://issues.apache.org/jira/browse/YARN-3624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-3624.patch


 AppliactionHistoryServer should not alter the order in which it gets the 
 filter chain. Additional filters should be added at the end of the chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538740#comment-14538740
 ] 

Wangda Tan commented on YARN-3434:
--

Ran it locally, all tests can passed, committing.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.8.0

 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-05-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538437#comment-14538437
 ] 

Zhijie Shen commented on YARN-3044:
---

Sorry to put my comments at last minute:

1. I'm still not sure why it is necessary to have RMContainerEntity. Whether 
the container entity comes from RM or NM, it's about container's info. Any 
reason we want differentiate both? At reader side, if I want to list all 
containers of an app, should I return RMContainerEntity or ContainerEntity? I 
incline to only having ContainerEntity, but RM and NM may put different 
info/event about it based on their knowledge.

2. Should v1 and v2 publisher only differentiate at publishEvent, however, 
it seems that we duplicate code more than that. And perhaps defining and 
implementing SystemMetricsEvent.toTimelineEvent can further cleanup the code.

3. I saw v2 is going to send config, but where the config is coming from. Did 
we conclude who and how to send the config? IAC, sending config seems to be 
half done. And we can use {{entity.addConfigs(event.getConfig());}}. No need to 
iterate over config collection and put each config one-by-one.

4. yarn.system-metrics-publisher.rm.publish.container-metrics - 
yarn.rm.system-metrics-publisher.emit-container-events?
{code}
374   public static final String RM_PUBLISH_CONTAINER_METRICS_ENABLED = 
YARN_PREFIX
375   + system-metrics-publisher.rm.publish.container-metrics;
376   public static final boolean 
DEFAULT_RM_PUBLISH_CONTAINER_METRICS_ENABLED =
377   false;
{code}
Moreover, I also think we should  not have 
yarn.system-metrics-publisher.enabled too, and reuse the existing config. And 
it's not limited to RM metrics publisher, but all existing ATS service. IMHO, 
the better practice is to reuse the existing config. And we can have a global 
config (or env var) timeline-service.version to determine the service is 
enabled with v1 or v2 implementation. Anyway, it's a separate problem, I'll 
file a separate jira for it.

5. Methods/innner classes in SystemMetricsPublisher don't need to be changed to 
public. Default is enough to access them? 

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
  Labels: BB2015-05-TBR
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
 YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, 
 YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-05-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-3619:
--

Assignee: Karthik Kambatla

 ContainerMetrics unregisters during getMetrics and leads to 
 ConcurrentModificationException
 ---

 Key: YARN-3619
 URL: https://issues.apache.org/jira/browse/YARN-3619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Karthik Kambatla

 ContainerMetrics is able to unregister itself during the getMetrics method, 
 but that method can be called by MetricsSystemImpl.sampleMetrics which is 
 trying to iterate the sources.  This leads to a 
 ConcurrentModificationException log like this:
 {noformat}
 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
 impl.MetricsSystemImpl: java.util.ConcurrentModificationException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538581#comment-14538581
 ] 

Jason Lowe commented on YARN-3619:
--

This appears to have been caused by YARN-2984.  [~kasha] would you mind taking 
a look?

 ContainerMetrics unregisters during getMetrics and leads to 
 ConcurrentModificationException
 ---

 Key: YARN-3619
 URL: https://issues.apache.org/jira/browse/YARN-3619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe

 ContainerMetrics is able to unregister itself during the getMetrics method, 
 but that method can be called by MetricsSystemImpl.sampleMetrics which is 
 trying to iterate the sources.  This leads to a 
 ConcurrentModificationException log like this:
 {noformat}
 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
 impl.MetricsSystemImpl: java.util.ConcurrentModificationException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3620) MetricsSystemImpl fails to show backtrace when an error occurs

2015-05-11 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3620:


 Summary: MetricsSystemImpl fails to show backtrace when an error 
occurs
 Key: YARN-3620
 URL: https://issues.apache.org/jira/browse/YARN-3620
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jason Lowe
Assignee: Jason Lowe


While investigating YARN-3619 it was frustrating that MetricsSystemImpl was 
logging a ConcurrentModificationException but without any backtrace.  Logging a 
backtrace would be very beneficial to tracking down the cause of the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-05-11 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3619:


 Summary: ContainerMetrics unregisters during getMetrics and leads 
to ConcurrentModificationException
 Key: YARN-3619
 URL: https://issues.apache.org/jira/browse/YARN-3619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe


ContainerMetrics is able to unregister itself during the getMetrics method, but 
that method can be called by MetricsSystemImpl.sampleMetrics which is trying to 
iterate the sources.  This leads to a ConcurrentModificationException log like 
this:
{noformat}
2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
impl.MetricsSystemImpl: java.util.ConcurrentModificationException
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538956#comment-14538956
 ] 

Wangda Tan commented on YARN-2921:
--

Hi [~ozawa],
Some comments:
- In MockAM.waitForState, I'm not very understand about the change: 1. why 
minWaitMSec is needed? 2. Why fail the method if {{if (waitedMsecs = 
timeoutMsecs)}} is true? I think it should check now-state against expected 
state.
- In two MockRM.waitForState method, I think we should also check 
app.getState() instead of time, correct?
- In TestRMRestart, you can use GenericTestUtils.waitFor instead.

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0, 2.7.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi Ozawa
 Attachments: YARN-2921.001.patch, YARN-2921.002.patch, 
 YARN-2921.003.patch, YARN-2921.004.patch, YARN-2921.005.patch, 
 YARN-2921.006.patch, YARN-2921.007.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538961#comment-14538961
 ] 

Wangda Tan commented on YARN-3489:
--

Committing.

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489.01.patch, YARN-3489.02.patch, 
 YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539007#comment-14539007
 ] 

Hudson commented on YARN-3489:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7800 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7800/])
YARN-3489. RMServerUtils.validateResourceRequests should only obtain queue info 
once. (Varun Saxena via wangda) (wangda: rev 
d6f6741296639a73f5306e3ebefec84a40ca03e5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java


 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489.01.patch, YARN-3489.02.patch, 
 YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539018#comment-14539018
 ] 

J.Andreina commented on YARN-3617:
--

Thanks [~xafero] for reporting this issue. If you have already started working 
on this, please reassign to you.

 Fix unused variable to get CPU frequency on Windows systems
 ---

 Key: YARN-3617
 URL: https://issues.apache.org/jira/browse/YARN-3617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Assignee: J.Andreina
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, 
 there is an unused variable for CPU frequency.
  /** {@inheritDoc} */
   @Override
   public long getCpuFrequency() {
 refreshIfNeeded();
 return -1;   
   }
 Please change '-1' to use 'cpuFrequencyKhz'.
 org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2015-05-11 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou resolved YARN-1231.
-
Resolution: Won't Fix

 YARN-2637 Has fixed the problem described in YARN-276.
So this ticket  needn't to be fixed anymore.

 Fix test cases that will hit max- am-used-resources-percent limit after 
 YARN-276
 

 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou
  Labels: test
 Attachments: YARN-1231.patch


 Use a separate jira to fix YARN's test cases that will fail by hitting max- 
 am-used-resources-percent limit after YARN-276.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2015-05-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539061#comment-14539061
 ] 

Mit Desai commented on YARN-2900:
-

I was stuck in something else. I'll update on that by tomorrow

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900-b2.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3623) Having the config to indicate the timeline service version

2015-05-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3623:
-

 Summary: Having the config to indicate the timeline service version
 Key: YARN-3623
 URL: https://issues.apache.org/jira/browse/YARN-3623
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


So far RM, MR AM, DA AM added/changed new config to enable the feature to write 
the timeline data to v2 server. It's good to have a YARN 
timeline-service.version config like timeline-service.enable to indicate the 
version of the running timeline service with the given YARN cluster. It's 
beneficial for users to more smoothly move from v1 to v2, as they don't need to 
change the existing config, but switch this config from v1 to v2. And each 
framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538716#comment-14538716
 ] 

Xuan Gong commented on YARN-3505:
-

Upload a new patch to address all the comments


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539174#comment-14539174
 ] 

Hadoop QA commented on YARN-3543:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 27s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  7s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 25s | The patch appears to introduce 3 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 49s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m  7s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  52m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 109m 13s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-applicationhistoryservice |
|  |  Redundant nullcheck of app, which is known to be non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAMContainer(ApplicationAttemptId)
  Redundant null check at ApplicationHistoryManagerImpl.java:is known to be 
non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAMContainer(ApplicationAttemptId)
  Redundant null check at ApplicationHistoryManagerImpl.java:[line 96] |
|  |  Redundant nullcheck of app, which is known to be non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ContainerId)
  Redundant null check at ApplicationHistoryManagerImpl.java:is known to be 
non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ContainerId)
  Redundant null check at ApplicationHistoryManagerImpl.java:[line 203] |
|  |  Redundant nullcheck of app, which is known to be non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationAttemptId)
  Redundant null check at ApplicationHistoryManagerImpl.java:is known to be 
non-null in 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationAttemptId)
  Redundant null check at ApplicationHistoryManagerImpl.java:[line 235] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731571/0003-YARN-3543.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3d28611 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-05-11 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3529:

Attachment: YARN-3529-YARN-2928.002.patch

New patch to fix the findbugs warnings. 

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-11 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3613:

Attachment: YARN-3613-1.patch

Please review the patch.
Removed 2 unused imports.
Test time reduced from ~130 to ~80 sec

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Attachments: YARN-3613-1.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539200#comment-14539200
 ] 

Hadoop QA commented on YARN-2556:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 41s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests | 105m 46s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| | | 122m 20s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732105/YARN-2556.3.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 3d28611 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7873/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7873/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7873/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7873/console |


This message was automatically generated.

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.patch, 
 yarn2556.patch, yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-05-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-3619:
--

Assignee: zhihai xu  (was: Karthik Kambatla)

Zhihai pinged me offline mentioning he knows the root cause behind this. [~zxu] 
- assigning this to you. 

 ContainerMetrics unregisters during getMetrics and leads to 
 ConcurrentModificationException
 ---

 Key: YARN-3619
 URL: https://issues.apache.org/jira/browse/YARN-3619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: zhihai xu

 ContainerMetrics is able to unregister itself during the getMetrics method, 
 but that method can be called by MetricsSystemImpl.sampleMetrics which is 
 trying to iterate the sources.  This leads to a 
 ConcurrentModificationException log like this:
 {noformat}
 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
 impl.MetricsSystemImpl: java.util.ConcurrentModificationException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538890#comment-14538890
 ] 

Wangda Tan commented on YARN-3521:
--

Thanks for updating, [~sunilg],
Latest patch LGTM, +1.

 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 
 0003-YARN-3521.patch, 0004-YARN-3521.patch, 0005-YARN-3521.patch, 
 0006-YARN-3521.patch, 0007-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538897#comment-14538897
 ] 

Hadoop QA commented on YARN-3505:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  7s | The applied patch generated  1 
new checkstyle issues (total was 1, now 2). |
| {color:red}-1{color} | checkstyle |   2m 22s | The applied patch generated  2 
new checkstyle issues (total was 70, now 63). |
| {color:green}+1{color} | whitespace |   0m 21s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 35s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m 10s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m 55s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732030/YARN-3505.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ea11590 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7866/console |


This message was automatically generated.

 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538898#comment-14538898
 ] 

Hadoop QA commented on YARN-3624:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 48s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 27s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 49s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   3m  3s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  38m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732032/YARN-3624.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 444836b |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7867/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7867/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7867/console |


This message was automatically generated.

 ApplicationHistoryServer reverses the order of the filters it gets
 --

 Key: YARN-3624
 URL: https://issues.apache.org/jira/browse/YARN-3624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-3624.patch


 AppliactionHistoryServer should not alter the order in which it gets the 
 filter chain. Additional filters should be added at the end of the chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3545) Investigate the concurrency issue with the map of timeline collector

2015-05-11 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3545:

Attachment: YARN-3545-YARN-2928.000.patch

In this patch I'm using concurrent hash map to replace synchronized hash map. 
After removing the global lock, we need to consider two cases, concurrent 
putIfAbsent calls, and concurrent putIfAbsent call and get call. 

The case with concurrent putIfAbsent call and get call is addressed by a 
initialization barrier since the contention is low. With this solution on the 
best case each read will only have one volatile variable read, instead of 
getting the lock inside synchronized map. 

The case with multiple concurrent putIfAbsents is addressed by speculatively 
allocate a collector, and try to putIfAbsent it to the hash map. It then call 
postPut and publish this new collector to all readers if the putIfAbsent call 
succeed (returns null). If the putIfAbsent call failed, someone else has 
already allocated a collector and we need to use that collector. To speed up 
this case, I used a fast path such that the putIfAbsent call only tries to 
allocate collectors if there was no collector for it at the beginning of this 
method. 

I'd appreciate comments since I may miss something here...

 Investigate the concurrency issue with the map of timeline collector
 

 Key: YARN-3545
 URL: https://issues.apache.org/jira/browse/YARN-3545
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3545-YARN-2928.000.patch


 See the discussion in YARN-3390 for details. Let's continue the discussion 
 here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3545) Investigate the concurrency issue with the map of timeline collector

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539001#comment-14539001
 ] 

Hadoop QA commented on YARN-3545:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 11s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 40s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  37m  3s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
|  |  Spinning on TimelineCollector.initialized in 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.initializationBarrier(TimelineCollector)
  At TimelineCollectorManager.java: At TimelineCollectorManager.java:[line 161] 
|
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732071/YARN-3545-YARN-2928.000.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b3b791b |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7870/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7870/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7870/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7870/console |


This message was automatically generated.

 Investigate the concurrency issue with the map of timeline collector
 

 Key: YARN-3545
 URL: https://issues.apache.org/jira/browse/YARN-3545
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3545-YARN-2928.000.patch


 See the discussion in YARN-3390 for details. Let's continue the discussion 
 here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina reassigned YARN-3617:


Assignee: J.Andreina

 Fix unused variable to get CPU frequency on Windows systems
 ---

 Key: YARN-3617
 URL: https://issues.apache.org/jira/browse/YARN-3617
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Assignee: J.Andreina
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, 
 there is an unused variable for CPU frequency.
  /** {@inheritDoc} */
   @Override
   public long getCpuFrequency() {
 refreshIfNeeded();
 return -1;   
   }
 Please change '-1' to use 'cpuFrequencyKhz'.
 org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3493:
-
Fix Version/s: 2.7.1

 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Fix For: 2.8.0, 2.7.1

 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, 
 YARN-3493.4.patch, YARN-3493.5.patch, yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-05-11 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3529:

Attachment: YARN-3529-YARN-2928.001.patch

New patch addressed [~zjshen]'s comments. I changed maven organization for 
dependency information, added one implementation level configuration for 
setting up connection strings, and teardown the phoenix server at the end of 
the unit test. 

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-05-11 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539069#comment-14539069
 ] 

Rohith commented on YARN-3543:
--

[~vinodkv] Kindly review the updated patch..

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538878#comment-14538878
 ] 

Wangda Tan commented on YARN-3362:
--

The latest patch LGTM.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
 Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
 2015.05.10_3362_Queue_Hierarchy.png, CSWithLabelsView.png, 
 No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 
 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, 
 YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, 
 YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, 
 YARN-3362.20150511-1.patch, capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538905#comment-14538905
 ] 

Hadoop QA commented on YARN-3625:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 27s | The applied patch generated  1 
new checkstyle issues (total was 6, now 6). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 49s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   3m 12s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  39m 25s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732038/YARN-3625.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 444836b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7868/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7868/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7868/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7868/console |


This message was automatically generated.

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2015-05-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538911#comment-14538911
 ] 

Zhijie Shen commented on YARN-2900:
---

[~mitdesai], have you got the change to fix {{java.lang.IllegalStateException: 
STREAM}}?

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900-b2.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539057#comment-14539057
 ] 

Hudson commented on YARN-3493:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7801 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7801/])
Move YARN-3493 in CHANGES.txt from 2.8 to 2.7.1 (wangda: rev 
3d28611cc6850de129b831158c420f9487103213)
* hadoop-yarn-project/CHANGES.txt


 RM fails to come up with error Failed to load/recover state when  mem 
 settings are changed
 

 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Assignee: Jian He
Priority: Critical
 Fix For: 2.8.0, 2.7.1

 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, 
 YARN-3493.4.patch, YARN-3493.5.patch, yarn-yarn-resourcemanager.log.zip


 RM fails to come up for the following case:
 1. Change yarn.nodemanager.resource.memory-mb and 
 yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
 background and wait for the job to reach running state
 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
 before the above job completes
 4. Restart RM
 5. RM fails to come up with the below error
 {code:title= RM error for Mem settings changed}
  - RM app submission failed in validating AM resource request for application 
 application_1429094976272_0008
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested memory  0, or requested memory  max configured, 
 requestedMemory=3072, maxMemory=2048
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
 at 
 

[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3625:
--
Description: RollingLevelDBTimelineStore batches all entities in the same 
put to improve performance. This causes an error when relating to an entity in 
the same put however.

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1886) Exceptions in the RM log while cleaning up app attempt

2015-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-1886.
---
Resolution: Duplicate

 Exceptions in the RM log while cleaning up app attempt
 --

 Key: YARN-1886
 URL: https://issues.apache.org/jira/browse/YARN-1886
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta

 Noticed exceptions in the RM log while HA tests were running where we killed 
 RM/AM/Namnode etc.
 RM failed over and the new active RM tried to kill the old app attempt and 
 ran into this exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538803#comment-14538803
 ] 

Hudson commented on YARN-3434:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7799 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7799/])
Moved YARN-3434. (Interaction between reservations and userlimit can result in 
significant ULF violation.) From 2.8.0 to 2.7.1 (wangda: rev 
1952f9395870e7b631d43418e075e774b9d2)
* hadoop-yarn-project/CHANGES.txt


 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 2.8.0, 2.7.1

 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1886) Exceptions in the RM log while cleaning up app attempt

2015-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538800#comment-14538800
 ] 

Jian He commented on YARN-1886:
---

YARN-1885 fixed this problem. close this

 Exceptions in the RM log while cleaning up app attempt
 --

 Key: YARN-1886
 URL: https://issues.apache.org/jira/browse/YARN-1886
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta

 Noticed exceptions in the RM log while HA tests were running where we killed 
 RM/AM/Namnode etc.
 RM failed over and the new active RM tried to kill the old app attempt and 
 ran into this exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-11 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538849#comment-14538849
 ] 

Craig Welch commented on YARN-3626:
---

To resolve this, the situation should be detected and, when applicable, 
localized resources should be put at the beginning of the classpath rather than 
the end.

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch

 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-3618.

Resolution: Duplicate

 Fix unused variable to get CPU frequency on Windows systems
 ---

 Key: YARN-3618
 URL: https://issues.apache.org/jira/browse/YARN-3618
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Priority: Minor
  Labels: easyfix
   Original Estimate: 1h
  Remaining Estimate: 1h

 In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, 
 there is an unused variable for CPU frequency.
  /** {@inheritDoc} */
   @Override
   public long getCpuFrequency() {
 refreshIfNeeded();
 return -1;   
   }
 Please change '-1' to use 'cpuFrequencyKhz'.
 org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems

2015-05-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538844#comment-14538844
 ] 

Brahma Reddy Battula commented on YARN-3618:


Resloved as duplicate of YARN-3617,as both are same..

 Fix unused variable to get CPU frequency on Windows systems
 ---

 Key: YARN-3618
 URL: https://issues.apache.org/jira/browse/YARN-3618
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
 Environment: Windows 7 x64 SP1
Reporter: Georg Berendt
Priority: Minor
  Labels: easyfix
   Original Estimate: 1h
  Remaining Estimate: 1h

 In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, 
 there is an unused variable for CPU frequency.
  /** {@inheritDoc} */
   @Override
   public long getCpuFrequency() {
 refreshIfNeeded();
 return -1;   
   }
 Please change '-1' to use 'cpuFrequencyKhz'.
 org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-11 Thread Craig Welch (JIRA)
Craig Welch created YARN-3626:
-

 Summary: On Windows localized resources are not moved to the front 
of the classpath when they should be
 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch


In response to the mapreduce.job.user.classpath.first setting the classpath is 
ordered differently so that localized resources will appear before system 
classpath resources when tasks execute.  On Windows this does not work because 
the localized resources are not linked into their final location when the 
classpath jar is created.  To compensate for that localized jar resources are 
added directly to the classpath generated for the jar rather than being 
discovered from the localized directories.  Unfortunately, they are always 
appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1297) Miscellaneous Fair Scheduler speedups

2015-05-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-1297:
--
Attachment: YARN-1297.4.patch

Updating patch to fix the test failure

* Had missed accounting for app container recovery during scheduler recovery.

 Miscellaneous Fair Scheduler speedups
 -

 Key: YARN-1297
 URL: https://issues.apache.org/jira/browse/YARN-1297
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Sandy Ryza
Assignee: Arun Suresh
  Labels: BB2015-05-TBR
 Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.3.patch, 
 YARN-1297.4.patch, YARN-1297.patch, YARN-1297.patch


 I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
 identified a bunch of minimally invasive changes that can shave off a few 
 milliseconds.
 The main one is demoting a couple INFO log messages to DEBUG, which brought 
 my benchmark down from 16000 ms to 6000.
 A few others (which had way less of an impact) were
 * Most of the time in comparisons was being spent in Math.signum.  I switched 
 this to direct ifs and elses and it halved the percent of time spent in 
 comparisons.
 * I removed some unnecessary instantiations of Resource objects
 * I made it so that queues' usage wasn't calculated from the applications up 
 each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3625:
--
Attachment: YARN-3625.1.patch

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2000) Fix ordering of starting services inside the RM

2015-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2000.
---
Resolution: Invalid

 Fix ordering of starting services inside the RM
 ---

 Key: YARN-2000
 URL: https://issues.apache.org/jira/browse/YARN-2000
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The order of starting services in RM would be:
 - Recovery of the app/attempts
 - Start the scheduler and add scheduler app/attempts
 - Start ResourceTrackerService and re-populate the containers in scheduler 
 based on the containers info from NMs 
 - ApplicationMasterService either don’t start or start but block until all 
 the previous NMs registers.
 Other than these, there are other services like ClientRMService, Webapps 
 which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM

2015-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538822#comment-14538822
 ] 

Jian He commented on YARN-2000:
---

bq. Probably we can have state-store stop last so that all the other services 
are stopped first and won't accept more requests and send events to state-store.
Even if state-store stops first, the API calls such as submitApplication won't 
return true until the state-store operation completes. 
Nothing to be done, close.

 Fix ordering of starting services inside the RM
 ---

 Key: YARN-2000
 URL: https://issues.apache.org/jira/browse/YARN-2000
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The order of starting services in RM would be:
 - Recovery of the app/attempts
 - Start the scheduler and add scheduler app/attempts
 - Start ResourceTrackerService and re-populate the containers in scheduler 
 based on the containers info from NMs 
 - ApplicationMasterService either don’t start or start but block until all 
 the previous NMs registers.
 Other than these, there are other services like ClientRMService, Webapps 
 which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538836#comment-14538836
 ] 

Wangda Tan commented on YARN-3362:
--

Hi Naga,
Thanks for updating,

1) To your questions: 
https://issues.apache.org/jira/browse/YARN-3362?focusedCommentId=14537181page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14537181,
You can refer to YARN-2824 for more information about why default cap of 
labeled resource set to zero.
The default of max-cap is 100 because queue can use such resource without 
configure it.  Let me know if you have more questions.

2) About showing resources of partitions, I think it's very helpful. I think 
you can include used-resource of each partition as well, You can file a 
separate ticket if it is hard to be added with this ticket.

3) About Hide Hierarchy, I think it's good for queue capacity comparison, but 
admin may get confused after checked Hide Hierarchy, it's better to be added 
to some other places instead of modify queue UI itself.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
 Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
 2015.05.10_3362_Queue_Hierarchy.png, CSWithLabelsView.png, 
 No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 
 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, 
 YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, 
 YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, 
 YARN-3362.20150511-1.patch, capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-11 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3626:
--
Attachment: YARN-3626.0.patch

The attached patch propagates the conditional as a yarn configuration option 
and moves localized resources to the front of the classpath when appropriate

 On Windows localized resources are not moved to the front of the classpath 
 when they should be
 --

 Key: YARN-3626
 URL: https://issues.apache.org/jira/browse/YARN-3626
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3626.0.patch


 In response to the mapreduce.job.user.classpath.first setting the classpath 
 is ordered differently so that localized resources will appear before system 
 classpath resources when tasks execute.  On Windows this does not work 
 because the localized resources are not linked into their final location when 
 the classpath jar is created.  To compensate for that localized jar resources 
 are added directly to the classpath generated for the jar rather than being 
 discovered from the localized directories.  Unfortunately, they are always 
 appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-11 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-3625:
-

 Summary: RollingLevelDBTimelineStore Incorrectly Forbids Related 
Entity in Same Put
 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >