[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537625#comment-14537625 ] lachisis commented on YARN-3614: Yes, it is ok to check the existence of the directory first. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537628#comment-14537628 ] lachisis commented on YARN-3614: Yes, it is ok to check the existence of the directory first. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler
[ https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537681#comment-14537681 ] Dian Fu commented on YARN-3557: --- Hi [~leftnoteasy], I have posted the requirements about supporting configure constraints node label from both RM and NM on YARN-3409. About support script based node label configuration at RM side, what's your thought? Support Intel Trusted Execution Technology(TXT) in YARN scheduler - Key: YARN-3557 URL: https://issues.apache.org/jira/browse/YARN-3557 Project: Hadoop YARN Issue Type: New Feature Reporter: Dian Fu Attachments: Support TXT in YARN high level design doc.pdf Intel TXT defines platform-level enhancements that provide the building blocks for creating trusted platforms. A TXT aware YARN scheduler can schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 provides the capacity to restrict YARN applications to run only on cluster nodes that have a specified node label. This is a good mechanism that be utilized for TXT aware YARN scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537624#comment-14537624 ] lachisis commented on YARN-3614: Yes, it is ok to check the existence of the directory first. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537626#comment-14537626 ] lachisis commented on YARN-3614: Yes, it is ok to check the existence of the directory first. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537662#comment-14537662 ] Brahma Reddy Battula commented on YARN-3614: {quote} when standby resourcemanager try to transitiontoActive, it will cost more than ten minutes to load applications{quote} did you dig into this one, like why it's took 10mins..? Thanks FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537640#comment-14537640 ] lachisis commented on YARN-3614: I used HA of yarn for stable service. Months later, I find when standby resourcemanager try to transitiontoActiver, it will cost more than ten minutes to load applications. So I backup the rmstore in hdfs and change the configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstroe. And find it work well when transition. Later my partner restore backuped rmstore, and submitted a new application, then find resoucemanager cashed. I know restoring backuped rmstore when resourcemanager running is not suitable. But this also means the processing logic of FileSystemRMStateStore is weak a liitle. So I suggest a little change here. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537676#comment-14537676 ] Dian Fu commented on YARN-3409: --- Just to post requirements discussed in YARN-3557 here: Constraint node labels should be supported to be added from both RM and NM. As some labels such as TRUSTED/UNTRUSTED described in YARN-3557 require to be added from RM and some labels such as GPU, FPGA, LINUX, WINDOWS are more suitable to be added from NM. A large cluster may have all these kinds of labels coexist. Add constraint node labels -- Key: YARN-3409 URL: https://issues.apache.org/jira/browse/YARN-3409 Project: Hadoop YARN Issue Type: Sub-task Components: api, capacityscheduler, client Reporter: Wangda Tan Assignee: Wangda Tan Specify only one label for each node (IAW, partition a cluster) is a way to determinate how resources of a special set of nodes could be shared by a group of entities (like teams, departments, etc.). Partitions of a cluster has following characteristics: - Cluster divided to several disjoint sub clusters. - ACL/priority can apply on partition (Only market team / marke team has priority to use the partition). - Percentage of capacities can apply on partition (Market team has 40% minimum capacity and Dev team has 60% of minimum capacity of the partition). Constraints are orthogonal to partition, they’re describing attributes of node’s hardware/software just for affinity. Some example of constraints: - glibc version - JDK version - Type of CPU (x86_64/i686) - Type of OS (windows, linux, etc.) With this, application can be able to ask for resource has (glibc.version = 2.20 JDK.version = 8u20 x86_64). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537632#comment-14537632 ] lachisis commented on YARN-3614: Sorry, terrible network. How can i delete the repeated replys. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537645#comment-14537645 ] lachisis commented on YARN-3614: Thanks for the chance to provide the patch. I will submit the patch later. FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537726#comment-14537726 ] Tsuyoshi Ozawa commented on YARN-3170: -- [~brahmareddy] thank you for updating. {quote} We call MapReduce running on YARN MapReduce 2.0 (MRv2). {quote} A trailing double quotation is missing. Please add it before the period. YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170.patch The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-817) If input path does not exist application/job id is getting assigned.
[ https://issues.apache.org/jira/browse/YARN-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537817#comment-14537817 ] Rohith commented on YARN-817: - Input path is used by Application JVM. Application client should handle this before submiting the application to YARN. Closing as Invalid, reopen if any concern on this If input path does not exist application/job id is getting assigned. Key: YARN-817 URL: https://issues.apache.org/jira/browse/YARN-817 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Nishan Shetty Priority: Minor 1.Run job by giving input as some path which does not exist 2.Application/job is is getting assigned. 2013-06-12 16:00:24,494 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 12 Suggestion Before assiging job/app id input path check can be made. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-817) If input path does not exist application/job id is getting assigned.
[ https://issues.apache.org/jira/browse/YARN-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith resolved YARN-817. - Resolution: Invalid If input path does not exist application/job id is getting assigned. Key: YARN-817 URL: https://issues.apache.org/jira/browse/YARN-817 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Nishan Shetty Priority: Minor 1.Run job by giving input as some path which does not exist 2.Application/job is is getting assigned. 2013-06-12 16:00:24,494 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 12 Suggestion Before assiging job/app id input path check can be made. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3170: --- Attachment: YARN-3170-004.patch YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170-004.patch, YARN-3170.patch The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537780#comment-14537780 ] Hadoop QA commented on YARN-3170: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 2m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 6m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731885/YARN-3170-004.patch | | Optional Tests | site | | git revision | trunk / 3fa2efc | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7860/console | This message was automatically generated. YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170-004.patch, YARN-3170.patch The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537772#comment-14537772 ] Brahma Reddy Battula commented on YARN-3170: [~ozawa] updated the patch..Kindly Review..thanks YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170-004.patch, YARN-3170.patch The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3615) Yarn and Mapred queue CLI command support for Fairscheduler
Bibin A Chundatt created YARN-3615: -- Summary: Yarn and Mapred queue CLI command support for Fairscheduler Key: YARN-3615 URL: https://issues.apache.org/jira/browse/YARN-3615 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, scheduler Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Add support for CLI command when Fair scheduler is configured Listed few command which needs updation ./yarn queue -status job-queue-name *Current output* {code} Queue Name : root.sls_queue_2 State : RUNNING Capacity : 100.0% Current Capacity : 100.0% Maximum Capacity : -100.0% Default Node Label expression : Accessible Node Labels : {code} ./mapred queue -info job-queue-name ./mapred queue -list All the below commands currently displaying based on Capacity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3513: Attachment: YARN-3513.20150511-1.patch Ok [~devaraj.k], updated the patch as per your suggestion. Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: BB2015-05-TBR, newbie Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150511-1.patch Some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. Instead we need to add debug log for overall resource usage by all containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537869#comment-14537869 ] Hudson commented on YARN-3587: -- FAILURE: Integrated in Hadoop-trunk-Commit #7790 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7790/]) YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. projects. Contributed by Gabor Liptak. (junping_du: rev 7e543c27fa2881aa65967be384a6203bd5b2304f) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3596) Fix the javadoc of DelegationTokenSecretManager in hadoop-common
[ https://issues.apache.org/jira/browse/YARN-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3596. -- Resolution: Duplicate Fix the javadoc of DelegationTokenSecretManager in hadoop-common Key: YARN-3596 URL: https://issues.apache.org/jira/browse/YARN-3596 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3596.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3597) Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs
[ https://issues.apache.org/jira/browse/YARN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3597. -- Resolution: Duplicate Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs -- Key: YARN-3597 URL: https://issues.apache.org/jira/browse/YARN-3597 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3597.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3587: - Hadoop Flags: Reviewed Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3598) Fix the javadoc of DelegationTokenSecretManager in hadoop-mapreduce
[ https://issues.apache.org/jira/browse/YARN-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3598. -- Resolution: Duplicate Fix the javadoc of DelegationTokenSecretManager in hadoop-mapreduce --- Key: YARN-3598 URL: https://issues.apache.org/jira/browse/YARN-3598 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3598.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537911#comment-14537911 ] Junping Du commented on YARN-3276: -- Thanks [~zjshen] for review and comments! bq. TimelineServiceUtils - TimelineServiceHelper? Sure. Will update it. bq. Is mapreduce using it? Maybe simply @Private In my understanding, @Private could means it could be used by Common, HDFS, MapReduce, and YARN, so it could be broader than current limitation? I didn't remove MapReduce here as from other places, it seems we always keep MapReduce there as a practice even no obviously reference from MR project. May be better to keep here as it is? bq. TimelineEvent are not covered? Nice catch! Will update it. bq. AllocateResponsePBImpl change is not related? Yes. There are several findbug warnings (this and change in TimelineMetric.java) involved in previous patch on branch YARN-2928. I think it could be too overkill to file a separated JIRA to fix this simple issues so I put the fix here and update the title a little bit. Make sense? Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3276: - Attachment: YARN-3276-YARN-2928.v5.patch Fix most comments from [~zjshen] in v5 patch. Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537832#comment-14537832 ] Hadoop QA commented on YARN-3513: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 36s | The applied patch generated 1 new checkstyle issues (total was 27, now 27). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 57s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 41m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731901/YARN-3513.20150511-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3fa2efc | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7861/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7861/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7861/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7861/console | This message was automatically generated. Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: BB2015-05-TBR, newbie Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, YARN-3513.20150511-1.patch Some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. Instead we need to add debug log for overall resource usage by all containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537858#comment-14537858 ] David Villegas commented on YARN-3409: -- Thanks for your comment, Wangda. I agree that loadAvg may not be useful in all cases. The main idea for dynamic label values is the system would be more extensible, and reduce human errors if some of the labels can be automatically populated. An example that comes to mind, based on Dian's comment, is the NodeManager's Operating System. Rather than having an administrator set it, it could be pre-set to the actual OS by the NM. Add constraint node labels -- Key: YARN-3409 URL: https://issues.apache.org/jira/browse/YARN-3409 Project: Hadoop YARN Issue Type: Sub-task Components: api, capacityscheduler, client Reporter: Wangda Tan Assignee: Wangda Tan Specify only one label for each node (IAW, partition a cluster) is a way to determinate how resources of a special set of nodes could be shared by a group of entities (like teams, departments, etc.). Partitions of a cluster has following characteristics: - Cluster divided to several disjoint sub clusters. - ACL/priority can apply on partition (Only market team / marke team has priority to use the partition). - Percentage of capacities can apply on partition (Market team has 40% minimum capacity and Dev team has 60% of minimum capacity of the partition). Constraints are orthogonal to partition, they’re describing attributes of node’s hardware/software just for affinity. Some example of constraints: - glibc version - JDK version - Type of CPU (x86_64/i686) - Type of OS (windows, linux, etc.) With this, application can be able to ask for resource has (glibc.version = 2.20 JDK.version = 8u20 x86_64). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3587: - Summary: Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. (was: Fix the javadoc of DelegationTokenSecretManager in yarn project) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537837#comment-14537837 ] nijel commented on YARN-3614: - hi @lachisis bq.when standby resourcemanager try to transitiontoActive, it will cost more than ten minutes to load applications Is this a secure cluster ? FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash - Key: YARN-3614 URL: https://issues.apache.org/jira/browse/YARN-3614 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: lachisis Priority: Critical FileSystemRMStateStore is only a accessorial plug-in of rmstore. When it failed to remove application, I think warning is enough, but now resourcemanager crashed. Recently, I configure yarn.resourcemanager.state-store.max-completed-applications to limit applications number in rmstore. when applications number exceed the limit, some old applications will be removed. If failed to remove, resourcemanager will crash. The following is log: 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1430994493305_0053 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1430994493305_0053 java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at
[jira] [Resolved] (YARN-3599) Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn
[ https://issues.apache.org/jira/browse/YARN-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3599. -- Resolution: Duplicate Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn -- Key: YARN-3599 URL: https://issues.apache.org/jira/browse/YARN-3599 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3599.1.patch, YARN-3599.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-401) ClientRMService.getQueueInfo can return stale application reports
[ https://issues.apache.org/jira/browse/YARN-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-401. - Resolution: Duplicate This was fixed by YARN-2978. ClientRMService.getQueueInfo can return stale application reports - Key: YARN-401 URL: https://issues.apache.org/jira/browse/YARN-401 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.6 Reporter: Jason Lowe Priority: Minor ClientRMService.getQueueInfo is modifying a QueueInfo object when application reports are requested. Unfortunately this QueueInfo object could be a persisting object in the scheduler, and modifying it in this way can lead to stale application reports being returned to the client. Here's an example scenario with CapacityScheduler: # A client asks for queue info on queue X with application reports # ClientRMService.getQueueInfo modifies the queue's QueueInfo object and sets application reports on it # Another client asks for recursive queue info from the root queue without application reports # Since the old application reports are still attached to queue X's QueueInfo object, these stale reports appear in the QueueInfo data for queue X in the results Normally if the client is not asking for application reports it won't be looking for and act upon any application reports that happen to appear in the queue info result. However we shouldn't be returning application reports in the first place, and when we do, they shouldn't be stale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537962#comment-14537962 ] Billie Rinaldi commented on YARN-3422: -- That's true, changing the name to indicate direction would also be helpful. I think that fixing this limitation would complicate the write path significantly and is probably not worthwhile in ATS v1. If someone were to implement it, we would need to take before and after performance measurements and possibly make the new feature optional. relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538017#comment-14538017 ] Akira AJISAKA commented on YARN-3587: - Agree with [~djp]. Late +1 from me. Thanks [~djp], [~jianhe], and [~gliptak] for contribution! Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538030#comment-14538030 ] Junping Du commented on YARN-3587: -- Thanks [~ajisakaa]! :) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538034#comment-14538034 ] Hudson commented on YARN-3587: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #192 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/192/]) YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. projects. Contributed by Gabor Liptak. (junping_du: rev 7e543c27fa2881aa65967be384a6203bd5b2304f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537978#comment-14537978 ] Hadoop QA commented on YARN-3276: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 36s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 12s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 2 new checkstyle issues (total was 105, now 107). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 50s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 47m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731922/YARN-3276-YARN-2928.v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b3b791b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7862/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7862/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7862/console | This message was automatically generated. Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3360: - Attachment: YARN-3360.002.patch Updated patch to trunk. Add JMX metrics to TimelineDataManager -- Key: YARN-3360 URL: https://issues.apache.org/jira/browse/YARN-3360 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-3360.001.patch, YARN-3360.002.patch The TimelineDataManager currently has no metrics, outside of the standard JVM metrics. It would be very useful to at least log basic counts of method calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3587) Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc.
[ https://issues.apache.org/jira/browse/YARN-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538046#comment-14538046 ] Hudson commented on YARN-3587: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2140 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2140/]) YARN-3587. Fix the javadoc of DelegationTokenSecretManager in yarn, etc. projects. Contributed by Gabor Liptak. (junping_du: rev 7e543c27fa2881aa65967be384a6203bd5b2304f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/delegation/DelegationTokenSecretManager.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JHSDelegationTokenSecretManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java Fix the javadoc of DelegationTokenSecretManager in projects of yarn, etc. - Key: YARN-3587 URL: https://issues.apache.org/jira/browse/YARN-3587 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Gabor Liptak Priority: Minor Labels: newbie Fix For: 2.8.0 Attachments: YARN-3587.1.patch, YARN-3587.patch In RMDelegationTokenSecretManager and TimelineDelegationTokenSecretManager, the javadoc of the constructor is as follows: {code} /** * Create a secret manager * @param delegationKeyUpdateInterval the number of seconds for rolling new *secret keys. * @param delegationTokenMaxLifetime the maximum lifetime of the delegation *tokens * @param delegationTokenRenewInterval how often the tokens must be renewed * @param delegationTokenRemoverScanInterval how often the tokens are scanned *for expired tokens */ {code} 1. the number of seconds should be the number of milliseconds. 2. It's better to add time unit to the description of other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538058#comment-14538058 ] Junping Du commented on YARN-3044: -- Sorry for coming late on this. Latest patch LGTM too. [~sjlee0], feel free to go ahead to commit this! However, for [~vinodkv]'s comments We can take a dual pronged approach here? That or we make the RM-publisher itself a distributed push. which sounds reasonable to me but haven't fully addressed in this JIRA. Shall we open a new JIRA for further discussion on this? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538078#comment-14538078 ] Hadoop QA commented on YARN-3360: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 28s | The applied patch generated 19 new checkstyle issues (total was 7, now 26). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 47s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 8s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 38m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731939/YARN-3360.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7e543c2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7863/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7863/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7863/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7863/console | This message was automatically generated. Add JMX metrics to TimelineDataManager -- Key: YARN-3360 URL: https://issues.apache.org/jira/browse/YARN-3360 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-3360.001.patch, YARN-3360.002.patch The TimelineDataManager currently has no metrics, outside of the standard JVM metrics. It would be very useful to at least log basic counts of method calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538367#comment-14538367 ] Wangda Tan commented on YARN-3434: -- Thanks Allen! Trying it. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.8.0 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538242#comment-14538242 ] Brahma Reddy Battula commented on YARN-3170: [~aw] Thanks for taking look into this issue.. Updated the patch based on your comments..Kindly review...Let me anyother rework in second paragraph ( Mainly first line )... YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3170-002.patch, YARN-3170-003.patch, YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170.patch The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3595) Performance optimization using connection cache of Phoenix timeline writer
[ https://issues.apache.org/jira/browse/YARN-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538341#comment-14538341 ] Li Lu commented on YARN-3595: - Hi [~sjlee0], thanks for the suggestions. I think you're right that most complexities come from having a cache rather than a pool for those connections. I'll look into alternative solutions. Performance optimization using connection cache of Phoenix timeline writer -- Key: YARN-3595 URL: https://issues.apache.org/jira/browse/YARN-3595 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu The story about the connection cache in Phoenix timeline storage is a little bit long. In YARN-3033 we planned to have shared writer layer for all collectors in the same collector manager. In this way we can better reuse the same heavy-weight storage layer connection, therefore it's more friendly to conventional storage layer connections which are typically heavy-weight. Phoenix, on the other hand, implements its own connection interface layer to be light-weight, thread-unsafe. To make these connections work with our multiple collector, single writer model, we're adding a thread indexed connection cache. However, many performance critical factors are yet to be tested. In this JIRA we're tracing performance optimization efforts using this connection cache. Previously we had a draft, but there was one implementation challenge on cache evictions: There may be races between Guava cache's removal listener calls (which close the connection) and normal references to the connection. We need to carefully define the way they synchronize. Performance-wise, at the very beginning stage we may need to understand: # If the current, thread-based indexing is an appropriate approach, or we can use some better ways to index the connections. # the best size of the cache, presumably as the proposed default value of a configuration. # how long we need to preserve a connection in the cache. Please feel free to add this list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538466#comment-14538466 ] Vinod Kumar Vavilapalli commented on YARN-3134: --- Tx folks, this is great progress! [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Fix For: YARN-2928 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3616) determine how to generate YARN container events
[ https://issues.apache.org/jira/browse/YARN-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538303#comment-14538303 ] Naganarasimha G R commented on YARN-3616: - I would like to continue working on this issue :). Also to capture one important point from [~Vinodkv]'s review bq. The missing dots occur when a container's life-cycle ends either on the RM or the AM. We can take a dual pronged approach here? That or we make the RM-publisher itself a distributed push. IMO dual pronged approach would be better, we can rely on NMs to post normal life cycle events and in rare cases where NM cant handle, RM publish events directly to ATS. And might be here distributed push might not work as in the cases which Vinod mentioned NM might not be able to handle publishing as TimelineCollector might not be created as no container is created in the NM side for that app. Correct me if i am wrong. determine how to generate YARN container events --- Key: YARN-3616 URL: https://issues.apache.org/jira/browse/YARN-3616 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Naganarasimha G R The initial design called for the node manager to write YARN container events to take advantage of the distributed writes. RM acting as a sole writer of all YARN container events would have significant scalability problems. Still, there are some types of events that are not captured by the NM. The current implementation has both: RM writing container events and NM writing container events. We need to sort this out, and decide how we can write all needed container events in a scalable manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538346#comment-14538346 ] Allen Wittenauer commented on YARN-3434: You can run test-patch.sh locally and specify the branch using --branch. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.8.0 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
Georg Berendt created YARN-3617: --- Summary: Fix unused variable to get CPU frequency on Windows systems Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Priority: Minor In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems
Georg Berendt created YARN-3618: --- Summary: Fix unused variable to get CPU frequency on Windows systems Key: YARN-3618 URL: https://issues.apache.org/jira/browse/YARN-3618 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Priority: Minor In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3621) FairScheduler doesn't count AM vcores towards max-share
Karthik Kambatla created YARN-3621: -- Summary: FairScheduler doesn't count AM vcores towards max-share Key: YARN-3621 URL: https://issues.apache.org/jira/browse/YARN-3621 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla FairScheduler seems to not count AM vcores towards max-vcores. On a queue with maxVcores set to 1, I am able to run a sleep job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3622) Enable application client to communicate with new timeline service
Zhijie Shen created YARN-3622: - Summary: Enable application client to communicate with new timeline service Key: YARN-3622 URL: https://issues.apache.org/jira/browse/YARN-3622 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN application has client and AM. We have the story to make TimelineClient work inside AM for v2, but not for client. TimelineClient inside app client needs to be taken care of too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3505: Attachment: YARN-3505.4.patch Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch, YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
Mit Desai created YARN-3624: --- Summary: ApplicationHistoryServer reverses the order of the filters it gets Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-3624: Attachment: YARN-3624.patch attaching the patch ApplicationHistoryServer reverses the order of the filters it gets -- Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3624.patch AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538740#comment-14538740 ] Wangda Tan commented on YARN-3434: -- Ran it locally, all tests can passed, committing. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.8.0 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538437#comment-14538437 ] Zhijie Shen commented on YARN-3044: --- Sorry to put my comments at last minute: 1. I'm still not sure why it is necessary to have RMContainerEntity. Whether the container entity comes from RM or NM, it's about container's info. Any reason we want differentiate both? At reader side, if I want to list all containers of an app, should I return RMContainerEntity or ContainerEntity? I incline to only having ContainerEntity, but RM and NM may put different info/event about it based on their knowledge. 2. Should v1 and v2 publisher only differentiate at publishEvent, however, it seems that we duplicate code more than that. And perhaps defining and implementing SystemMetricsEvent.toTimelineEvent can further cleanup the code. 3. I saw v2 is going to send config, but where the config is coming from. Did we conclude who and how to send the config? IAC, sending config seems to be half done. And we can use {{entity.addConfigs(event.getConfig());}}. No need to iterate over config collection and put each config one-by-one. 4. yarn.system-metrics-publisher.rm.publish.container-metrics - yarn.rm.system-metrics-publisher.emit-container-events? {code} 374 public static final String RM_PUBLISH_CONTAINER_METRICS_ENABLED = YARN_PREFIX 375 + system-metrics-publisher.rm.publish.container-metrics; 376 public static final boolean DEFAULT_RM_PUBLISH_CONTAINER_METRICS_ENABLED = 377 false; {code} Moreover, I also think we should not have yarn.system-metrics-publisher.enabled too, and reuse the existing config. And it's not limited to RM metrics publisher, but all existing ATS service. IMHO, the better practice is to reuse the existing config. And we can have a global config (or env var) timeline-service.version to determine the service is enabled with v1 or v2 implementation. Anyway, it's a separate problem, I'll file a separate jira for it. 5. Methods/innner classes in SystemMetricsPublisher don't need to be changed to public. Default is enough to access them? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-3619: -- Assignee: Karthik Kambatla ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: Karthik Kambatla ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538581#comment-14538581 ] Jason Lowe commented on YARN-3619: -- This appears to have been caused by YARN-2984. [~kasha] would you mind taking a look? ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3620) MetricsSystemImpl fails to show backtrace when an error occurs
Jason Lowe created YARN-3620: Summary: MetricsSystemImpl fails to show backtrace when an error occurs Key: YARN-3620 URL: https://issues.apache.org/jira/browse/YARN-3620 Project: Hadoop YARN Issue Type: Bug Reporter: Jason Lowe Assignee: Jason Lowe While investigating YARN-3619 it was frustrating that MetricsSystemImpl was logging a ConcurrentModificationException but without any backtrace. Logging a backtrace would be very beneficial to tracking down the cause of the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
Jason Lowe created YARN-3619: Summary: ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538956#comment-14538956 ] Wangda Tan commented on YARN-2921: -- Hi [~ozawa], Some comments: - In MockAM.waitForState, I'm not very understand about the change: 1. why minWaitMSec is needed? 2. Why fail the method if {{if (waitedMsecs = timeoutMsecs)}} is true? I think it should check now-state against expected state. - In two MockRM.waitForState method, I think we should also check app.getState() instead of time, correct? - In TestRMRestart, you can use GenericTestUtils.waitFor instead. MockRM#waitForState methods can be too slow and flaky - Key: YARN-2921 URL: https://issues.apache.org/jira/browse/YARN-2921 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.6.0, 2.7.0 Reporter: Karthik Kambatla Assignee: Tsuyoshi Ozawa Attachments: YARN-2921.001.patch, YARN-2921.002.patch, YARN-2921.003.patch, YARN-2921.004.patch, YARN-2921.005.patch, YARN-2921.006.patch, YARN-2921.007.patch MockRM#waitForState methods currently sleep for too long (2 seconds and 1 second). This leads to slow tests and sometimes failures if the App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538961#comment-14538961 ] Wangda Tan commented on YARN-3489: -- Committing. RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Labels: BB2015-05-RFC Attachments: YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539007#comment-14539007 ] Hudson commented on YARN-3489: -- FAILURE: Integrated in Hadoop-trunk-Commit #7800 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7800/]) YARN-3489. RMServerUtils.validateResourceRequests should only obtain queue info once. (Varun Saxena via wangda) (wangda: rev d6f6741296639a73f5306e3ebefec84a40ca03e5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Labels: BB2015-05-RFC Attachments: YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539018#comment-14539018 ] J.Andreina commented on YARN-3617: -- Thanks [~xafero] for reporting this issue. If you have already started working on this, please reassign to you. Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276
[ https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou resolved YARN-1231. - Resolution: Won't Fix YARN-2637 Has fixed the problem described in YARN-276. So this ticket needn't to be fixed anymore. Fix test cases that will hit max- am-used-resources-percent limit after YARN-276 Key: YARN-1231 URL: https://issues.apache.org/jira/browse/YARN-1231 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.1.1-beta Reporter: Nemon Lou Assignee: Nemon Lou Labels: test Attachments: YARN-1231.patch Use a separate jira to fix YARN's test cases that will fail by hitting max- am-used-resources-percent limit after YARN-276. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539061#comment-14539061 ] Mit Desai commented on YARN-2900: - I was stuck in something else. I'll update on that by tomorrow Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900-b2.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3623) Having the config to indicate the timeline service version
Zhijie Shen created YARN-3623: - Summary: Having the config to indicate the timeline service version Key: YARN-3623 URL: https://issues.apache.org/jira/browse/YARN-3623 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen So far RM, MR AM, DA AM added/changed new config to enable the feature to write the timeline data to v2 server. It's good to have a YARN timeline-service.version config like timeline-service.enable to indicate the version of the running timeline service with the given YARN cluster. It's beneficial for users to more smoothly move from v1 to v2, as they don't need to change the existing config, but switch this config from v1 to v2. And each framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538716#comment-14538716 ] Xuan Gong commented on YARN-3505: - Upload a new patch to address all the comments Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch, YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539174#comment-14539174 ] Hadoop QA commented on YARN-3543: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 27s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 7s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 25s | The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 49s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 7s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 52m 7s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 109m 13s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-applicationhistoryservice | | | Redundant nullcheck of app, which is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAMContainer(ApplicationAttemptId) Redundant null check at ApplicationHistoryManagerImpl.java:is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAMContainer(ApplicationAttemptId) Redundant null check at ApplicationHistoryManagerImpl.java:[line 96] | | | Redundant nullcheck of app, which is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ContainerId) Redundant null check at ApplicationHistoryManagerImpl.java:is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ContainerId) Redundant null check at ApplicationHistoryManagerImpl.java:[line 203] | | | Redundant nullcheck of app, which is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationAttemptId) Redundant null check at ApplicationHistoryManagerImpl.java:is known to be non-null in org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationAttemptId) Redundant null check at ApplicationHistoryManagerImpl.java:[line 235] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731571/0003-YARN-3543.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3d28611 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7872/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results |
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3529: Attachment: YARN-3529-YARN-2928.002.patch New patch to fix the findbugs warnings. Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, YARN-3529-YARN-2928.002.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods
[ https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3613: Attachment: YARN-3613-1.patch Please review the patch. Removed 2 unused imports. Test time reduced from ~130 to ~80 sec TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods -- Key: YARN-3613 URL: https://issues.apache.org/jira/browse/YARN-3613 Project: Hadoop YARN Issue Type: Improvement Components: test Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: nijel Priority: Minor Labels: newbie Attachments: YARN-3613-1.patch In TestContainerManagerSecurity, individual tests init and start Yarn cluster. This duplication can be avoided by moving that to setup. Further, one could merge the two @Test methods to avoid bringing up another mini-cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539200#comment-14539200 ] Hadoop QA commented on YARN-2556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 41s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 105m 46s | Tests passed in hadoop-mapreduce-client-jobclient. | | | | 122m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732105/YARN-2556.3.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 3d28611 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7873/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7873/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7873/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7873/console | This message was automatically generated. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-3619: -- Assignee: zhihai xu (was: Karthik Kambatla) Zhihai pinged me offline mentioning he knows the root cause behind this. [~zxu] - assigning this to you. ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538890#comment-14538890 ] Wangda Tan commented on YARN-3521: -- Thanks for updating, [~sunilg], Latest patch LGTM, +1. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 0003-YARN-3521.patch, 0004-YARN-3521.patch, 0005-YARN-3521.patch, 0006-YARN-3521.patch, 0007-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538897#comment-14538897 ] Hadoop QA commented on YARN-3505: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 7s | The applied patch generated 1 new checkstyle issues (total was 1, now 2). | | {color:red}-1{color} | checkstyle | 2m 22s | The applied patch generated 2 new checkstyle issues (total was 70, now 63). | | {color:green}+1{color} | whitespace | 0m 21s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 35s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 10s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 55s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 102m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732030/YARN-3505.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ea11590 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7866/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7866/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7866/console | This message was automatically generated. Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch, YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538898#comment-14538898 ] Hadoop QA commented on YARN-3624: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 27s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 49s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 3s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 38m 59s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732032/YARN-3624.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 444836b | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7867/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7867/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7867/console | This message was automatically generated. ApplicationHistoryServer reverses the order of the filters it gets -- Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3624.patch AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3545) Investigate the concurrency issue with the map of timeline collector
[ https://issues.apache.org/jira/browse/YARN-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3545: Attachment: YARN-3545-YARN-2928.000.patch In this patch I'm using concurrent hash map to replace synchronized hash map. After removing the global lock, we need to consider two cases, concurrent putIfAbsent calls, and concurrent putIfAbsent call and get call. The case with concurrent putIfAbsent call and get call is addressed by a initialization barrier since the contention is low. With this solution on the best case each read will only have one volatile variable read, instead of getting the lock inside synchronized map. The case with multiple concurrent putIfAbsents is addressed by speculatively allocate a collector, and try to putIfAbsent it to the hash map. It then call postPut and publish this new collector to all readers if the putIfAbsent call succeed (returns null). If the putIfAbsent call failed, someone else has already allocated a collector and we need to use that collector. To speed up this case, I used a fast path such that the putIfAbsent call only tries to allocate collectors if there was no collector for it at the beginning of this method. I'd appreciate comments since I may miss something here... Investigate the concurrency issue with the map of timeline collector Key: YARN-3545 URL: https://issues.apache.org/jira/browse/YARN-3545 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3545-YARN-2928.000.patch See the discussion in YARN-3390 for details. Let's continue the discussion here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3545) Investigate the concurrency issue with the map of timeline collector
[ https://issues.apache.org/jira/browse/YARN-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539001#comment-14539001 ] Hadoop QA commented on YARN-3545: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 11s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 40s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 3s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | | | Spinning on TimelineCollector.initialized in org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.initializationBarrier(TimelineCollector) At TimelineCollectorManager.java: At TimelineCollectorManager.java:[line 161] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732071/YARN-3545-YARN-2928.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b3b791b | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7870/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7870/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7870/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7870/console | This message was automatically generated. Investigate the concurrency issue with the map of timeline collector Key: YARN-3545 URL: https://issues.apache.org/jira/browse/YARN-3545 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3545-YARN-2928.000.patch See the discussion in YARN-3390 for details. Let's continue the discussion here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3617) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina reassigned YARN-3617: Assignee: J.Andreina Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3617 URL: https://issues.apache.org/jira/browse/YARN-3617 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Assignee: J.Andreina Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3493: - Fix Version/s: 2.7.1 RM fails to come up with error Failed to load/recover state when mem settings are changed Key: YARN-3493 URL: https://issues.apache.org/jira/browse/YARN-3493 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Sumana Sathish Assignee: Jian He Priority: Critical Fix For: 2.8.0, 2.7.1 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, YARN-3493.4.patch, YARN-3493.5.patch, yarn-yarn-resourcemanager.log.zip RM fails to come up for the following case: 1. Change yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background and wait for the job to reach running state 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 before the above job completes 4. Restart RM 5. RM fails to come up with the below error {code:title= RM error for Mem settings changed} - RM app submission failed in validating AM resource request for application application_1429094976272_0008 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(579)) - Failed to load/recover state org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3529: Attachment: YARN-3529-YARN-2928.001.patch New patch addressed [~zjshen]'s comments. I changed maven organization for dependency information, added one implementation level configuration for setting up connection strings, and teardown the phoenix server at the end of the unit test. Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539069#comment-14539069 ] Rohith commented on YARN-3543: -- [~vinodkv] Kindly review the updated patch.. ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538878#comment-14538878 ] Wangda Tan commented on YARN-3362: -- The latest patch LGTM. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 2015.05.10_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538905#comment-14538905 ] Hadoop QA commented on YARN-3625: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 27s | The applied patch generated 1 new checkstyle issues (total was 6, now 6). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 49s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 12s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 39m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732038/YARN-3625.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 444836b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7868/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7868/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7868/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7868/console | This message was automatically generated. RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538911#comment-14538911 ] Zhijie Shen commented on YARN-2900: --- [~mitdesai], have you got the change to fix {{java.lang.IllegalStateException: STREAM}}? Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2900-b2.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539057#comment-14539057 ] Hudson commented on YARN-3493: -- FAILURE: Integrated in Hadoop-trunk-Commit #7801 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7801/]) Move YARN-3493 in CHANGES.txt from 2.8 to 2.7.1 (wangda: rev 3d28611cc6850de129b831158c420f9487103213) * hadoop-yarn-project/CHANGES.txt RM fails to come up with error Failed to load/recover state when mem settings are changed Key: YARN-3493 URL: https://issues.apache.org/jira/browse/YARN-3493 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Sumana Sathish Assignee: Jian He Priority: Critical Fix For: 2.8.0, 2.7.1 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, YARN-3493.4.patch, YARN-3493.5.patch, yarn-yarn-resourcemanager.log.zip RM fails to come up for the following case: 1. Change yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background and wait for the job to reach running state 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 before the above job completes 4. Restart RM 5. RM fails to come up with the below error {code:title= RM error for Mem settings changed} - RM app submission failed in validating AM resource request for application application_1429094976272_0008 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(579)) - Failed to load/recover state org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at
[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3625: -- Description: RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch RollingLevelDBTimelineStore batches all entities in the same put to improve performance. This causes an error when relating to an entity in the same put however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1886) Exceptions in the RM log while cleaning up app attempt
[ https://issues.apache.org/jira/browse/YARN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1886. --- Resolution: Duplicate Exceptions in the RM log while cleaning up app attempt -- Key: YARN-1886 URL: https://issues.apache.org/jira/browse/YARN-1886 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Noticed exceptions in the RM log while HA tests were running where we killed RM/AM/Namnode etc. RM failed over and the new active RM tried to kill the old app attempt and ran into this exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538803#comment-14538803 ] Hudson commented on YARN-3434: -- FAILURE: Integrated in Hadoop-trunk-Commit #7799 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7799/]) Moved YARN-3434. (Interaction between reservations and userlimit can result in significant ULF violation.) From 2.8.0 to 2.7.1 (wangda: rev 1952f9395870e7b631d43418e075e774b9d2) * hadoop-yarn-project/CHANGES.txt Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.8.0, 2.7.1 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1886) Exceptions in the RM log while cleaning up app attempt
[ https://issues.apache.org/jira/browse/YARN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538800#comment-14538800 ] Jian He commented on YARN-1886: --- YARN-1885 fixed this problem. close this Exceptions in the RM log while cleaning up app attempt -- Key: YARN-1886 URL: https://issues.apache.org/jira/browse/YARN-1886 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Noticed exceptions in the RM log while HA tests were running where we killed RM/AM/Namnode etc. RM failed over and the new active RM tried to kill the old app attempt and ran into this exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538849#comment-14538849 ] Craig Welch commented on YARN-3626: --- To resolve this, the situation should be detected and, when applicable, localized resources should be put at the beginning of the classpath rather than the end. On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved YARN-3618. Resolution: Duplicate Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3618 URL: https://issues.apache.org/jira/browse/YARN-3618 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Priority: Minor Labels: easyfix Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3618) Fix unused variable to get CPU frequency on Windows systems
[ https://issues.apache.org/jira/browse/YARN-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538844#comment-14538844 ] Brahma Reddy Battula commented on YARN-3618: Resloved as duplicate of YARN-3617,as both are same.. Fix unused variable to get CPU frequency on Windows systems --- Key: YARN-3618 URL: https://issues.apache.org/jira/browse/YARN-3618 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows 7 x64 SP1 Reporter: Georg Berendt Priority: Minor Labels: easyfix Original Estimate: 1h Remaining Estimate: 1h In the class 'WindowsResourceCalculatorPlugin.java' of the YARN project, there is an unused variable for CPU frequency. /** {@inheritDoc} */ @Override public long getCpuFrequency() { refreshIfNeeded(); return -1; } Please change '-1' to use 'cpuFrequencyKhz'. org/apache/hadoop/yarn/util/WindowsResourceCalculatorPlugin.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
Craig Welch created YARN-3626: - Summary: On Windows localized resources are not moved to the front of the classpath when they should be Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-1297: -- Attachment: YARN-1297.4.patch Updating patch to fix the test failure * Had missed accounting for app container recovery during scheduler recovery. Miscellaneous Fair Scheduler speedups - Key: YARN-1297 URL: https://issues.apache.org/jira/browse/YARN-1297 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Sandy Ryza Assignee: Arun Suresh Labels: BB2015-05-TBR Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.3.patch, YARN-1297.4.patch, YARN-1297.patch, YARN-1297.patch I ran the Fair Scheduler's core scheduling loop through a profiler tool and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
[ https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3625: -- Attachment: YARN-3625.1.patch RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put -- Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3625.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2000) Fix ordering of starting services inside the RM
[ https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-2000. --- Resolution: Invalid Fix ordering of starting services inside the RM --- Key: YARN-2000 URL: https://issues.apache.org/jira/browse/YARN-2000 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He The order of starting services in RM would be: - Recovery of the app/attempts - Start the scheduler and add scheduler app/attempts - Start ResourceTrackerService and re-populate the containers in scheduler based on the containers info from NMs - ApplicationMasterService either don’t start or start but block until all the previous NMs registers. Other than these, there are other services like ClientRMService, Webapps which we need to think about the order too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM
[ https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538822#comment-14538822 ] Jian He commented on YARN-2000: --- bq. Probably we can have state-store stop last so that all the other services are stopped first and won't accept more requests and send events to state-store. Even if state-store stops first, the API calls such as submitApplication won't return true until the state-store operation completes. Nothing to be done, close. Fix ordering of starting services inside the RM --- Key: YARN-2000 URL: https://issues.apache.org/jira/browse/YARN-2000 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He The order of starting services in RM would be: - Recovery of the app/attempts - Start the scheduler and add scheduler app/attempts - Start ResourceTrackerService and re-populate the containers in scheduler based on the containers info from NMs - ApplicationMasterService either don’t start or start but block until all the previous NMs registers. Other than these, there are other services like ClientRMService, Webapps which we need to think about the order too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538836#comment-14538836 ] Wangda Tan commented on YARN-3362: -- Hi Naga, Thanks for updating, 1) To your questions: https://issues.apache.org/jira/browse/YARN-3362?focusedCommentId=14537181page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14537181, You can refer to YARN-2824 for more information about why default cap of labeled resource set to zero. The default of max-cap is 100 because queue can use such resource without configure it. Let me know if you have more questions. 2) About showing resources of partitions, I think it's very helpful. I think you can include used-resource of each partition as well, You can file a separate ticket if it is hard to be added with this ticket. 3) About Hide Hierarchy, I think it's good for queue capacity comparison, but admin may get confused after checked Hide Hierarchy, it's better to be added to some other places instead of modify queue UI itself. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 2015.05.10_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.0.patch The attached patch propagates the conditional as a yarn configuration option and moves localized resources to the front of the classpath when appropriate On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3626.0.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
Jonathan Eagles created YARN-3625: - Summary: RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put Key: YARN-3625 URL: https://issues.apache.org/jira/browse/YARN-3625 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.3.4#6332)