[jira] [Commented] (YARN-2912) Jersey Tests failing with port in use
[ https://issues.apache.org/jira/browse/YARN-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242485#comment-14242485 ] Steve Loughran commented on YARN-2912: -- let's try it and see +1 Jersey Tests failing with port in use - Key: YARN-2912 URL: https://issues.apache.org/jira/browse/YARN-2912 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 3.0.0 Environment: jenkins on java 8 Reporter: Steve Loughran Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2912.patch Jersey tests like TestNMWebServices apps are failing with port in use. The jersey test runner appears to always use the same port unless a system property is set to point to a different one. Every test should really be changing that sysprop in a @Before method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242519#comment-14242519 ] Rohith commented on YARN-2917: -- Thanks for your explanation about necessacity of draining events. I do not think any other side effects. I will upload new patch soon. Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2917: - Attachment: 0002-YARN-2917.patch Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242534#comment-14242534 ] Rohith commented on YARN-2917: -- Kindly review the attached patch. I have manually verified problematic scenario as I mentioned in my [previous comment|https://issues.apache.org/jira/browse/YARN-2917?focusedCommentId=14233971page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14233971] by deploying in 1 node cluster. It is able to shutdown gracefully calling ShutdownHook. Not related to this jira : I observed that ResourceManager#rmDispatcher does not drain ,is it bug? Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242549#comment-14242549 ] Hadoop QA commented on YARN-2356: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682693/0002-YARN-2356.patch against trunk revision 390642a. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6084//console This message is automatically generated. yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at $Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242552#comment-14242552 ] Hadoop QA commented on YARN-2917: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686586/0002-YARN-2917.patch against trunk revision 390642a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 25 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6083//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6083//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6083//console This message is automatically generated. Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242564#comment-14242564 ] Rohith commented on YARN-2917: -- Findbug warnings are unrelated to this patch Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242578#comment-14242578 ] Hudson commented on YARN-2437: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #35 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/35/]) YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw) (aw: rev 59cb8b9123fac725660fc7cfbaaad3d1aa3e3bd7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Varun Saxena Labels: newbie Fix For: 3.0.0 Attachments: YARN-2437.001.patch, YARN-2437.002.patch, YARN-2437.patch With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242592#comment-14242592 ] Hudson commented on YARN-2437: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1969 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/]) YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw) (aw: rev 59cb8b9123fac725660fc7cfbaaad3d1aa3e3bd7) * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh * hadoop-yarn-project/CHANGES.txt start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Varun Saxena Labels: newbie Fix For: 3.0.0 Attachments: YARN-2437.001.patch, YARN-2437.002.patch, YARN-2437.patch With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-2356: Target Version/s: 2.7.0 yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at $Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1402668848165_0015' doesn't exist in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242666#comment-14242666 ] Hudson commented on YARN-2437: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #39 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/39/]) YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw) (aw: rev 59cb8b9123fac725660fc7cfbaaad3d1aa3e3bd7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Varun Saxena Labels: newbie Fix For: 3.0.0 Attachments: YARN-2437.001.patch, YARN-2437.002.patch, YARN-2437.patch With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: TestYARN2946.java Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0001-YARN-2946.patch Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242690#comment-14242690 ] Hudson commented on YARN-2437: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1989 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1989/]) YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw) (aw: rev 59cb8b9123fac725660fc7cfbaaad3d1aa3e3bd7) * hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh * hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh * hadoop-yarn-project/CHANGES.txt start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Varun Saxena Labels: newbie Fix For: 3.0.0 Attachments: YARN-2437.001.patch, YARN-2437.002.patch, YARN-2437.patch With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242695#comment-14242695 ] Rohith commented on YARN-2946: -- I wrote small program(TestYARN2946.java attached) to simulate exact deadlock scenario. The same naming convention I have used for better understanding same as deadlock involved classes and its same implementation logic. Running TestYARN2946.java with synchronized keyword in method updateFencedState() causes deadlock.After the fix i.e by removing synchronized keyword runs the program without deadlock in while loop. This is only simulation. In the attached patch, I have done 2 changes # Removed *synchronized* keyword from method updateFencedState(). # Changed the method updateFencedState() modifier from public to private since it is used only from method notifyStoreOperationFailed(). Kindly review the analysis and attached patch. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242694#comment-14242694 ] Rohith commented on YARN-2946: -- Thanks [~varun_saxena] for your suggestion. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1935) Security for timeline server
[ https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242764#comment-14242764 ] Hitesh Shah commented on YARN-1935: --- [~vinodkv] [~zjshen] Wasn't most of the secure support for timeline with respect to application data already introduced in 2.5 and 2.6? If yes, does this jira need to be closed out as it confuses users as to whether Timeline is/isn't supported in a secure environment? Security for timeline server Key: YARN-1935 URL: https://issues.apache.org/jira/browse/YARN-1935 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Zhijie Shen Attachments: Timeline Security Diagram.pdf, Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch Jira to track work to secure the ATS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2912) Jersey Tests failing with port in use
[ https://issues.apache.org/jira/browse/YARN-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242771#comment-14242771 ] Hadoop QA commented on YARN-2912: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685569/YARN-2912.patch against trunk revision 390642a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 70 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6085//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6085//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6085//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6085//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6085//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6085//console This message is automatically generated. Jersey Tests failing with port in use - Key: YARN-2912 URL: https://issues.apache.org/jira/browse/YARN-2912 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 3.0.0 Environment: jenkins on java 8 Reporter: Steve Loughran Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2912.patch Jersey tests like TestNMWebServices apps are failing with port in use. The jersey test runner appears to always use the same port unless a system property is set to point to a different one. Every test should really be changing that sysprop in a @Before method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242829#comment-14242829 ] Harsh J commented on YARN-2950: --- File of message is {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java}} Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
Harsh J created YARN-2950: - Summary: Change message to mandate, not suggest JS requirement on UI Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2950: -- Labels: newbie (was: ) Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242846#comment-14242846 ] Wangda Tan commented on YARN-2637: -- [~cwelch], I will take a look at this patch today as well. Thanks, maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2950: -- Affects Version/s: 2.5.0 Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.5.0 Reporter: Harsh J Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242853#comment-14242853 ] Jian He commented on YARN-2917: --- patch looks good, thanks Rohith ! Committing this. bq. I observed that ResourceManager#rmDispatcher does not drain ,is it bug? I think the decision was to make the state-store relevant dispatcher drained. rmDispatcher is a global dispatcher. draining that may take more time depending on how busy the cluster is. I think it's still fine to drain the rmDispatcher, but need to evaluate the pros and cons. Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242861#comment-14242861 ] Xuan Gong commented on YARN-2749: - -1 on findbug is unrelated. Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
Xuan Gong created YARN-2951: --- Summary: 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2951: Attachment: FindBugs_Report.html 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2951: Description: There are 20 findbugs warnings on trunk. See attached html file. 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242875#comment-14242875 ] Varun Saxena commented on YARN-2951: [~xgong], this is a duplicate of YARN-2937 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242879#comment-14242879 ] Varun Saxena commented on YARN-2951: JIRAs' from YARN-2937 to YARN-2940 will address the new findbugs warnings appearing after bumping up the findbugs version to 3.0.0 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-2951. - Resolution: Duplicate 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2951) 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242882#comment-14242882 ] Xuan Gong commented on YARN-2951: - yes, it is. Close this as duplicate. Thanks, [~varun_saxena] 20 Findbugs warnings on trunk in hadoop-yarn-server-nodemanager --- Key: YARN-2951 URL: https://issues.apache.org/jira/browse/YARN-2951 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Attachments: FindBugs_Report.html There are 20 findbugs warnings on trunk. See attached html file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242884#comment-14242884 ] Xuan Gong commented on YARN-2749: - findbug will be fix in https://issues.apache.org/jira/browse/YARN-2937 Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned YARN-2950: - Assignee: Dustin Cote Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Dustin Cote Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2940) Fix new findbugs warnings in rest of the hadoop-yarn components
[ https://issues.apache.org/jira/browse/YARN-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242890#comment-14242890 ] Li Lu commented on YARN-2940: - Similar to the patch of YARN-2939, I could not reproduce the test failures locally. For the results I can see all of them are connection related, which appears to be unrelated to the changes in this patch. Fix new findbugs warnings in rest of the hadoop-yarn components --- Key: YARN-2940 URL: https://issues.apache.org/jira/browse/YARN-2940 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Li Lu Attachments: YARN-2940-121014-1.patch, YARN-2940-121014.patch Fix findbugs warnings in the following YARN components: hadoop-yarn-applications-distributedshell hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-server-web-proxy hadoop-yarn-registry hadoop-yarn-server-common hadoop-yarn-client -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.1.patch And also patch. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: Nodes-page-with-label-filter.png Node-labels-page.png Attached screenshots for review. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242909#comment-14242909 ] Wangda Tan commented on YARN-2932: -- bq. Actually, rather than getting the queue's 'preemption-disable' status, I think it would make more sense to get the queue's preemption status. So, something like getPreemptionStatus. It would return true or false, depending on if queue is preemptable or not. What do you think? Make sense to me. Add entry for preemption setting to queue status screen and startup/refresh logging --- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242929#comment-14242929 ] Joao Salcedo commented on YARN-2950: Improve messaging in JavaScript is needed Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Dustin Cote Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2912) Jersey Tests failing with port in use
[ https://issues.apache.org/jira/browse/YARN-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242946#comment-14242946 ] Varun Saxena commented on YARN-2912: Test failure unrelated. Findbugs introduced due to bumping up of version to 3.0.0 and will be addressed by different JIRAs'(already raised). Jersey Tests failing with port in use - Key: YARN-2912 URL: https://issues.apache.org/jira/browse/YARN-2912 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 3.0.0 Environment: jenkins on java 8 Reporter: Steve Loughran Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2912.patch Jersey tests like TestNMWebServices apps are failing with port in use. The jersey test runner appears to always use the same port unless a system property is set to point to a different one. Every test should really be changing that sysprop in a @Before method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1244) Missing yarn queue-cli
[ https://issues.apache.org/jira/browse/YARN-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242968#comment-14242968 ] Devaraj K commented on YARN-1244: - It was handled with YARN-2647, duplicate of YARN-2647. Missing yarn queue-cli -- Key: YARN-1244 URL: https://issues.apache.org/jira/browse/YARN-1244 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli Labels: newbie Attachments: YARN-1244.1.patch We don't have a yarn queue CLI. For now mapred still has one that is working, but we need to move over that functionality to yarn CLI itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1244) Missing yarn queue-cli
[ https://issues.apache.org/jira/browse/YARN-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242974#comment-14242974 ] Hadoop QA commented on YARN-1244: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665668/YARN-1244.1.patch against trunk revision 8e9a266. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6087//console This message is automatically generated. Missing yarn queue-cli -- Key: YARN-1244 URL: https://issues.apache.org/jira/browse/YARN-1244 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli Labels: newbie Attachments: YARN-1244.1.patch We don't have a yarn queue CLI. For now mapred still has one that is working, but we need to move over that functionality to yarn CLI itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242988#comment-14242988 ] Hudson commented on YARN-2917: -- FAILURE: Integrated in Hadoop-trunk-Commit #6698 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6698/]) YARN-2917. Fixed potential deadlock when system.exit is called in AsyncDispatcher. Contributed by Rohith Sharmaks (jianhe: rev 614b6afea450ebb897fbb2519c6f02e13b9bd12d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook Key: YARN-2917 URL: https://issues.apache.org/jira/browse/YARN-2917 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1244) Missing yarn queue-cli
[ https://issues.apache.org/jira/browse/YARN-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved YARN-1244. - Resolution: Duplicate Missing yarn queue-cli -- Key: YARN-1244 URL: https://issues.apache.org/jira/browse/YARN-1244 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli Labels: newbie Attachments: YARN-1244.1.patch We don't have a yarn queue CLI. For now mapred still has one that is working, but we need to move over that functionality to yarn CLI itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Attachment: YARN-2920.4.patch [~jianhe], Thanks for your comments, I've updated patch addressed all your suggestions, for your comment: bq. how about containers running on a node without label, and now we are adding a label. Now we will also kill containers on that node, this will be changed after we get YARN-2498 in. I added a TODO note at {{CapacityScheduler.updateLabelsOnNode}}. Please kindly review, Wangda CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243056#comment-14243056 ] Hadoop QA commented on YARN-2943: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686641/YARN-2943.1.patch against trunk revision 8e9a266. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 40 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6086//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6086//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6086//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6086//console This message is automatically generated. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243065#comment-14243065 ] Craig Welch commented on YARN-2637: --- I double checked - none of the findbugs warnings are related to my change and the tests actually pass on my box with the change - and are unrelated in any case, as far as I can see. There's plenty of chatter on other jira's that this is related to the jdk/findbugs update... so, I believe these can be ignored. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2938: --- Attachment: YARN-2938.002.patch KIck Jenkins with new patch. Same changes as previous patch Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2938.001.patch, YARN-2938.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2937) Fix new findbugs warnings in hadoop-yarn-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2937: --- Attachment: YARN-2937.002.patch Kick Jenkins. Patch same as previous one. Fix new findbugs warnings in hadoop-yarn-nodemanager Key: YARN-2937 URL: https://issues.apache.org/jira/browse/YARN-2937 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: HADOOP-11373.patch, YARN-2937.001.patch, YARN-2937.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243130#comment-14243130 ] Jian He commented on YARN-2920: --- - getUsedResources- getUsedResourcesByLabel - RMNodeLabelsManager constructor can pass rmContext, instead of a separate setRMDispatcher method - AM container is killed as well, should we not kill the am container until the max-am-percentage is met, similar to preemption? CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243145#comment-14243145 ] Ashwin Shankar commented on YARN-2003: -- Hi [~sunilg], is there anything that needs to be done on the RM side for app priority to work in fair scheduler ? There is already a patch for app priority from the fair sched side and was wondering if anything in this jira is blocking it. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Attachment: YARN-2920.5.patch bq. getUsedResources- getUsedResourcesByLabel And bq. RMNodeLabelsManager constructor can pass rmContext, instead of a separate setRMDispatcher method Make sense to me, updated. bq. AM container is killed as well, should we not kill the am container until the max-am-percentage is met, similar to preemption? This needs update internal used resource for LeafQueue/ParentQueue. With YARN-2498, containers will not be immediately killed, and preemption policy will handle that, AM is already last killed by preemption policy. Thanks, Wangda CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch, YARN-2920.5.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243171#comment-14243171 ] Craig Welch commented on YARN-2495: --- Sorry if I'm jumping in late and asking redundant questions as a result, but I've gone through the various related jiras and the design documents (incl updates) ( and the patch :-) ) and I have some requirements related questions as a result. Just a bit of background to make sure I understand things - it appears that we've settled on two different but related features here: 1. To enable node labels to be added or removed for a given node without validation against a centralized list of node labels (not on this patch, but relevant to the discussion) and 2. To enable node managers to specify their node labels based on local configuration and scripting (this patch is specific to that feature). These are, strictly speaking, orthogonal, but may be used together and will provide something more in a 'combined feature' A couple things about this feature (2) - I don't believe that it is necessary to add the node label configuration to the local configuration (yarn-site) or the heartbeat as such to enable configuration of labels for a node from the node in a decentralized fashion (e.g. a script on the node saying put these labels on me). This can already be accomplished using the admin cli from a script or calling a web service from the node (most likely the former, but either is possible...), so I don't think we need this change to support the script case, it's already possible to write a script to add a label to a node on the fly today without any changes. To make this dynamic we would need feature 1, from the above, but that's not covered in this patch / is a separate discussion / and also does not require this change. Also, I don't see how this change allows a script to dynamically configure labels unless it was changing the yarn-site or the like (I may have missed it, but I don't see that logic here) - and in any case, it would not be necessary to add this logic to support that sort of configuration as I pointed out. Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? It's not needed for the dynamic script case, which is all I've seen discussed here from a requirements perspective (putting it into the config file / adding it to the heartbeat is implementation, I don't see a requirement for it as such). In a nutshell - do we need change 2 (this), or do we really just need change 1 (eliminating validation of labels against a centralized list, at least as a configurable option)? Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2944: --- Attachment: YARN-2944-trunk-v1.patch Attached is v1 trunk patch. SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance -- Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Attachments: YARN-2944-trunk-v1.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more {noformat} This JIRA is to add a 0-argument constructor to SCMStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo resolved YARN-2168. Resolution: Fixed Closing issue as comments have been addressed in other subtasks of YARN-1492. SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch This jira is meant to be used to review the main shared cache APIs. They are as follows: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243180#comment-14243180 ] Hadoop QA commented on YARN-2920: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686656/YARN-2920.4.patch against trunk revision 614b6af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 28 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6088//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6088//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6088//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6088//console This message is automatically generated. CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch, YARN-2920.5.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2937) Fix new findbugs warnings in hadoop-yarn-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243187#comment-14243187 ] Hadoop QA commented on YARN-2937: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686671/YARN-2937.002.patch against trunk revision b9f6d0c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6090//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6090//console This message is automatically generated. Fix new findbugs warnings in hadoop-yarn-nodemanager Key: YARN-2937 URL: https://issues.apache.org/jira/browse/YARN-2937 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: HADOOP-11373.patch, YARN-2937.001.patch, YARN-2937.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243197#comment-14243197 ] Craig Welch commented on YARN-2495: --- So, I see the language around the script based vs the conf based provider, etc, so I assume that's where the scripting side comes in. However, it's still not clear to me that it's really a good idea to add all of this when there is already a way to accomplish the activity with a script - and already ways to run scripts from the node manager... Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243208#comment-14243208 ] Junping Du commented on YARN-2637: -- Ok. Thanks for double-check it. I will wait [~leftnoteasy] for more review comments and may commit it tomorrow if no future comments. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243213#comment-14243213 ] Sangjin Lee commented on YARN-2944: --- Thanks for the patch [~ctrezzo]! The patch looks good to me. One small nit: the InMemorySCMStore (and SCMStore) constructor that takes an AppChecker instance is needed only for unit testing purposes, right? If so, can we make it default scope instead of public and mark it as visible for testing? SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance -- Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Attachments: YARN-2944-trunk-v1.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more {noformat} This JIRA is to add a 0-argument constructor to SCMStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243217#comment-14243217 ] Allen Wittenauer commented on YARN-2495: bq. This can already be accomplished using the admin cli from a script or calling a web service from the node Think about the secure cluster case. A whole new level of complexity is required to get this functionality using your proposed method vs. having the NM just run the script itself. bq. Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? No, this is so we *don't* have to have hard-coded labels in a file. If we are doing an external software change, we need to be able to reflect that change up the chain. Think rolling upgrade. Think multiple service owners. FWIW, yes, we do have a solid need for this feature. Almost every ops person I talked to has said they'd likely make use it for the exact use cases I've highlighted above. Being able to roll out new versions of Java and make scheduling decisions on its installation is *extremely* powerful and useful. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243237#comment-14243237 ] Robert Kanter commented on YARN-2284: - Overall looks good, a few minor things: - For the methods in {{Configuration}} that are only meant for testing, can you annotate them as {{@VisibleForTesting}}? - In {{TestConfigurationFieldsBase.compareConfigurationToXmlFields}} we can use a HashSet instead of a TreeSet - Can you add some Javadoc to the top of {{TestMapreduceConfigFields}} to explain what this class is for and how to use it? It should be clear enough that somebody can later make a subclass to check their project, without having to look into the other code too much. - We should turn this into a parent JIRA in HADOOP, and then have separate child JIRAs for YARN, MAPREDUCE, and HDFS to add the subclasses for yarn-site, mapped-site, and hdfs-site. Don't worry about the findbugs warnings as long as their not from this patch; they recently upgraded the findbugs version and it's found some new ones -- there's a bunch of JIRAs to fix those. Find missing config options in YarnConfiguration and yarn-default.xml - Key: YARN-2284 URL: https://issues.apache.org/jira/browse/YARN-2284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: YARN-2284-04.patch, YARN-2284-05.patch, YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, YARN-2284-09.patch, YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch YarnConfiguration has one set of properties. yarn-default.xml has another set of properties. Ideally, there should be an automatic way to find missing properties in either location. This is analogous to MAPREDUCE-5130, but for yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243241#comment-14243241 ] Robert Kanter commented on YARN-2284: - Sorry, those should be yarn-default, mapred-default, and hdfs-default; not *-site. Find missing config options in YarnConfiguration and yarn-default.xml - Key: YARN-2284 URL: https://issues.apache.org/jira/browse/YARN-2284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: YARN-2284-04.patch, YARN-2284-05.patch, YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, YARN-2284-09.patch, YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch YarnConfiguration has one set of properties. yarn-default.xml has another set of properties. Ideally, there should be an automatic way to find missing properties in either location. This is analogous to MAPREDUCE-5130, but for yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243268#comment-14243268 ] Hadoop QA commented on YARN-2938: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686669/YARN-2938.002.patch against trunk revision b9f6d0c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6089//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6089//console This message is automatically generated. Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2938.001.patch, YARN-2938.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243280#comment-14243280 ] Hadoop QA commented on YARN-2944: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686681/YARN-2944-trunk-v1.patch against trunk revision 0bcea11. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager: org.apache.hadoop.yarn.server.sharedcachemanager.TestSharedCacheUploaderService org.apache.hadoop.yarn.server.sharedcachemanager.TestClientSCMProtocolService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6092//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6092//console This message is automatically generated. SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance -- Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Attachments: YARN-2944-trunk-v1.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243283#comment-14243283 ] Craig Welch commented on YARN-2495: --- Other comments on the patch as such, assuming we really do need this part of the change... DECENTRALIZED_CONFIGURATION_ENABLED et all - I do see the basis for enabling and disabling the enforcement of a centralized list and discussion around that, but I don't see any reason to have conditional enablement of the node manager side of things as well as a provider specification and I think it just adds unnecessary complexity and possible surprise at configuration time - at the level of this configuration I think it should just be enabled (and I don't mean just by default, I mean if we add this, it should just be a way to manage node labels, not conditionally enabled or disabled, any more than the web service or cli are conditionally enabled or disabled, and so we don't have this parameter / it's associated branching at all). I think the default node labels provider service should definitely be a null provider that always returns an empty list and areLabelsUpdated false - this takes out the need to decide which is default, a no-op one is, and it allows us to get rid of the extra enabled/disabled configuration above without adding a new configuration (the provider will be specified anyway if the feature is going to be used) NodeHeartbeatRequest - isNodeLabelsUpdated - I would go with areNodeLablesSet (all isNodeLabels = areNodeLabels wherever it appears, actually) - wrt Set vs Updated - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, updatedness, whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name. RegisterNodeLabelManagerResponse - get/set IsNodeLabelsAcceptedByRM - I would make it get/set AreNodeLablesAcceptedByRM (and on imples, etc, of course) RegisterNodeManagerRequest - missing spaces in args (l 42) also, assuming we drop the distributed on/off config as I'm suggesting, you'll need the areNodeLablesSet to be passed here as well. (I also like this better because it harmonizes the api between registration and heartbeat, which is easier to understand b/c they are doing the same thing / should do it the same way). Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243289#comment-14243289 ] Varun Saxena commented on YARN-2938: [~zjshen], kindly review. Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2938.001.patch, YARN-2938.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2937) Fix new findbugs warnings in hadoop-yarn-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2937: --- Attachment: YARN-2937.003.patch Fix new findbugs warnings in hadoop-yarn-nodemanager Key: YARN-2937 URL: https://issues.apache.org/jira/browse/YARN-2937 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: HADOOP-11373.patch, YARN-2937.001.patch, YARN-2937.002.patch, YARN-2937.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243320#comment-14243320 ] Hadoop QA commented on YARN-2946: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686604/0001-YARN-2946.patch against trunk revision 0bcea11. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6093//console This message is automatically generated. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243338#comment-14243338 ] Hadoop QA commented on YARN-2920: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686676/YARN-2920.5.patch against trunk revision 0bcea11. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 28 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6091//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6091//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6091//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6091//console This message is automatically generated. CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch, YARN-2920.5.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243358#comment-14243358 ] Zhijie Shen commented on YARN-2938: --- will review Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice -- Key: YARN-2938 URL: https://issues.apache.org/jira/browse/YARN-2938 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2938.001.patch, YARN-2938.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance
[ https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243365#comment-14243365 ] Chris Trezzo commented on YARN-2944: Thanks Sangjin. Ack, will adjust scope of the InMemorySCMStore constructor and fix the unit tests. SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance -- Key: YARN-2944 URL: https://issues.apache.org/jira/browse/YARN-2944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Attachments: YARN-2944-trunk-v1.patch Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create the SCMStore service. Unfortunately the SCMStore class does not have a 0-argument constructor. On startup, the SCM fails with the following: {noformat} 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting SharedCacheManager java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init() at java.lang.Class.getConstructor0(Class.java:2763) at java.lang.Class.getDeclaredConstructor(Class.java:2021) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 4 more {noformat} This JIRA is to add a 0-argument constructor to SCMStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243364#comment-14243364 ] Jian He commented on YARN-2946: --- looks good, +1 Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243371#comment-14243371 ] Jian He commented on YARN-2943: --- - {{LOG.info(yyy);}} ? - typo {{// Nodes will show need include Non-empty label filter}} - If I restart NM, “num of active NMs” is incorrect; this is probably because NM is using ephemeral ports; - “NO_LABEL” - “N/A”, similarly, update the previous nodes page to show “N/A” for not labeled nodes; - usability: num of active NMs link can point to a filtered list of nodes by label ? - pullRMNodeLabelsInfo - it’s calculated on demand and each time loop all the nodes in the cluster; we can probably promote Label as a separate class and internally bookkeeping the number of NMs. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243378#comment-14243378 ] Rohith commented on YARN-2946: -- Thanks [~jianhe] for reviewing analysis and patch. It seems some compilation error,I will take look at this and update patch. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2937) Fix new findbugs warnings in hadoop-yarn-nodemanager
[ https://issues.apache.org/jira/browse/YARN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243381#comment-14243381 ] Hadoop QA commented on YARN-2937: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686704/YARN-2937.003.patch against trunk revision 0bcea11. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6094//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6094//console This message is automatically generated. Fix new findbugs warnings in hadoop-yarn-nodemanager Key: YARN-2937 URL: https://issues.apache.org/jira/browse/YARN-2937 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: HADOOP-11373.patch, YARN-2937.001.patch, YARN-2937.002.patch, YARN-2937.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243384#comment-14243384 ] Rohith commented on YARN-2946: -- I was changed method modifier from public to private causing compilation error but it directly used from test. I will update patch without modifying method modifiers. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243385#comment-14243385 ] Hadoop QA commented on YARN-2946: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686604/0001-YARN-2946.patch against trunk revision 0bcea11. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6095//console This message is automatically generated. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243391#comment-14243391 ] Varun Saxena commented on YARN-2946: [~rohithsharma], making the method private makes sense. You can probably use VisibleForTesting annotation. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2920: - Attachment: YARN-2920.6.patch Fixed test failure (TestResourceTracker is related, but TestAbstractYarnScheduler cannot reproduce locally). Findbugs warning not related. CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch, YARN-2920.5.patch, YARN-2920.6.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243396#comment-14243396 ] Varun Saxena commented on YARN-2946: I guess same package access can be given instead of public. Private may not work because the annotation is only for documentation purposes. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor
[ https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243409#comment-14243409 ] Tsuyoshi OZAWA commented on YARN-2243: -- Good catch, +1. From the javadoc of [Google Guava|https://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/base/Preconditions.html]: {code} checkArgument(boolean expression, Object errorMessage) {code} Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor Key: YARN-2243 URL: https://issues.apache.org/jira/browse/YARN-2243 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Ted Yu Assignee: Devaraj K Priority: Minor Attachments: YARN-2243.patch, YARN-2243.patch {code} public SchedulerApplicationAttempt(ApplicationAttemptId applicationAttemptId, String user, Queue queue, ActiveUsersManager activeUsersManager, RMContext rmContext) { Preconditions.checkNotNull(RMContext should not be null, rmContext); {code} Order of arguments is wrong for Preconditions.checkNotNull(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor
[ https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243412#comment-14243412 ] Tsuyoshi OZAWA commented on YARN-2243: -- {code} checkNotNull(T reference, Object errorMessage) {code} Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor Key: YARN-2243 URL: https://issues.apache.org/jira/browse/YARN-2243 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Ted Yu Assignee: Devaraj K Priority: Minor Attachments: YARN-2243.patch, YARN-2243.patch {code} public SchedulerApplicationAttempt(ApplicationAttemptId applicationAttemptId, String user, Queue queue, ActiveUsersManager activeUsersManager, RMContext rmContext) { Preconditions.checkNotNull(RMContext should not be null, rmContext); {code} Order of arguments is wrong for Preconditions.checkNotNull(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243413#comment-14243413 ] Robert Kanter commented on YARN-2942: - Thanks for taking a look at the proposal Zhijie. Ya, it looks like YARN-2548 is related. That one looks to be more about long running jobs, and for this one I hadn't really considered those; this only works after the job finishes. 1. That's true. This design doesn't currently address that. However, the format used by the compacted files isn't anything special; the data is just dumped into the file and an index written to the index file for each container. As far as this format is concerned, we should be able to append more logs and indices to it. We would just need to figure out a good way to manage when they're appended and how this compaction process is triggered. 2. Yes. We'd leave the original aggregated logs until the compacted log is available. The JHS would continue using the aggregated log files until the compacted log file is ready. 3. I might not have been clear about that in the design. The RM would be the one to figure out when the app is done and the aggregated logs can be compacted. We'd run the actual compacting code in one of the NMs, so that the RM isn't spending cycles doing that, and so that we don't end up with a replica of each compacted log on one datanode (in other words, the RM would chose, at random or round-robin, an NM to do each app's compaction; this will cause the replicas to be spread around the cluster). 4. That's a good question; though I don't think the index is the problem here. It's small enough that we could always just rewrite a new index to replace the stale one. I think the problem would be with the compacted log file itself because we can't simply delete a chunk of it on HDFS; and it's big enough that there would be a lot of overhead to rewriting it. One solution here is to write a new compacted log file every N containers or file size, and we can do cleanup by deleting an earlier compacted log file and updating the index. The downside to this is that the life length of a container in a compacted log file would not all be equal, but that's probably okay. Perhaps we can start out with this design, and then modify it for long running jobs that support YARN-2468 to have some other way of: - Triggering/Managing the compaction process (#1) - Deleting old logs (#4) Perhaps we can use this JIRA for normal jobs and then use YARN-2548 to add support to it for long running jobs? What do you think [~zjshen] and [~xgong]? Aggregated Log Files should be compacted Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-preliminary.001.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0002-YARN-2946.patch Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243452#comment-14243452 ] Rohith commented on YARN-2946: -- I updated patch by changing test case.It should be fine now. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243458#comment-14243458 ] Rohith commented on YARN-2946: -- Kindly review the updated new patch fixing compilation errors. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2929) Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support
[ https://issues.apache.org/jira/browse/YARN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243482#comment-14243482 ] Chris Nauroth commented on YARN-2929: - I apologize, but I'm still having trouble seeing the usefulness of this. Using {{Path}} as I described earlier effectively changes any file path into valid URI syntax, using forward slashes instead of back slashes. I expect this would then be a valid, usable path at the NodeManager regardless of its OS. Even if the path originates from a shell script, I don't see how that would make a difference. Do you have an example YARN application submission that would demonstrate the problem in more detail? Alternatively, if you could point out a spot in Spark's YARN application submission code that demonstrates the problem, then I could look at that. I am assuming here that a path originating from a shell script would get passed into a Spark Java process, where Spark code would have an opportunity to use the {{Path}} class like I described. Please let me know if my assumption is wrong. There isn't anything necessarily wrong with the patch posted. It just looks to me at this point like it isn't required. By minimizing token replacement rules like this, we'd reduce the number of special cases that YARN application writers would need to consider. Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support Key: YARN-2929 URL: https://issues.apache.org/jira/browse/YARN-2929 Project: Hadoop YARN Issue Type: Improvement Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2929.001.patch Some frameworks like Spark is tackling to run jobs on Windows(SPARK-1825). For better multiple platform support, we should introduce ApplicationConstants.FILE_PATH_SEPARATOR for making filepath platform-independent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed
[ https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243568#comment-14243568 ] Hadoop QA commented on YARN-2920: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686715/YARN-2920.6.patch against trunk revision f6f2a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 28 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler The following test timeouts occurred in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6096//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6096//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6096//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6096//console This message is automatically generated. CapacityScheduler should be notified when labels on nodes changed - Key: YARN-2920 URL: https://issues.apache.org/jira/browse/YARN-2920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, YARN-2920.4.patch, YARN-2920.5.patch, YARN-2920.6.patch Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, but that is not enough upon labels on nodes changes: - Scheduler should be able to do take actions to running containers. (Like kill/preempt/do-nothing) - Used / available capacity in scheduler should be updated for future planning. We need add a new event to pass such updates to scheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243567#comment-14243567 ] Jian He commented on YARN-2762: --- looks good, one minor comment: we can have a common method for this: {code} SetString labels = new HashSetString(); for (String p : args.split(,)) { if (!p.trim().isEmpty()) { labels.add(p.trim()); } } if (labels.isEmpty()) { throw new IllegalArgumentException(NO_LABEL_ERR_MSG); } {code} RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243569#comment-14243569 ] Zhijie Shen commented on YARN-2942: --- bq. Perhaps we can use this JIRA for normal jobs and then use YARN-2548 to add support to it for long running jobs? It makes sense to separate normal applications and long running services, but we need to make sure the logs from long running services are not affected. In other word, compacting won't happen on the log files of long running services. Aggregated Log Files should be compacted Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-preliminary.001.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243578#comment-14243578 ] Hadoop QA commented on YARN-2946: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686723/0002-YARN-2946.patch against trunk revision f6f2a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6097//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6097//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6097//console This message is automatically generated. Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2952) Incorrect version check in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243585#comment-14243585 ] Jian He commented on YARN-2952: --- we may change {{loadedVersion = Version.newInstance(1, 0);}} to {{getCurrentVersion()}} Incorrect version check in RMStateStore --- Key: YARN-2952 URL: https://issues.apache.org/jira/browse/YARN-2952 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, it'll still store the version as 1.0 which is incorrect; {code} // if there is no version info, treat it as 1.0; if (loadedVersion == null) { loadedVersion = Version.newInstance(1, 0); } if (loadedVersion.isCompatibleTo(getCurrentVersion())) { LOG.info(Storing RM state version info + getCurrentVersion()); storeVersion(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2952) Incorrect version check in RMStateStore
Jian He created YARN-2952: - Summary: Incorrect version check in RMStateStore Key: YARN-2952 URL: https://issues.apache.org/jira/browse/YARN-2952 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, it'll still store the version as 1.0 which is incorrect; {code} // if there is no version info, treat it as 1.0; if (loadedVersion == null) { loadedVersion = Version.newInstance(1, 0); } if (loadedVersion.isCompatibleTo(getCurrentVersion())) { LOG.info(Storing RM state version info + getCurrentVersion()); storeVersion(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2952) Incorrect version check in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2952: -- Description: In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, it'll still store the version as 1.0 which is incorrect; The same thing might happen to NM store, timeline store. {code} // if there is no version info, treat it as 1.0; if (loadedVersion == null) { loadedVersion = Version.newInstance(1, 0); } if (loadedVersion.isCompatibleTo(getCurrentVersion())) { LOG.info(Storing RM state version info + getCurrentVersion()); storeVersion(); {code} was: In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, it'll still store the version as 1.0 which is incorrect; {code} // if there is no version info, treat it as 1.0; if (loadedVersion == null) { loadedVersion = Version.newInstance(1, 0); } if (loadedVersion.isCompatibleTo(getCurrentVersion())) { LOG.info(Storing RM state version info + getCurrentVersion()); storeVersion(); {code} Incorrect version check in RMStateStore --- Key: YARN-2952 URL: https://issues.apache.org/jira/browse/YARN-2952 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, it'll still store the version as 1.0 which is incorrect; The same thing might happen to NM store, timeline store. {code} // if there is no version info, treat it as 1.0; if (loadedVersion == null) { loadedVersion = Version.newInstance(1, 0); } if (loadedVersion.isCompatibleTo(getCurrentVersion())) { LOG.info(Storing RM state version info + getCurrentVersion()); storeVersion(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) Deadlock in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2946: -- Affects Version/s: (was: 2.6.0) 2.7.0 Deadlock in ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2929) Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support
[ https://issues.apache.org/jira/browse/YARN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243590#comment-14243590 ] Tsuyoshi OZAWA commented on YARN-2929: -- Sorry for the shortage of explanations and thanks for your clarification. {quote} I am assuming here that a path originating from a shell script would get passed into a Spark Java process, where Spark code would have an opportunity to use the Path class like I described. {quote} In fact, this problem happens when launching AM - before launching JVM of the AM. For instance, AM need classpath to launch. Sometimes classpath includes subdirectory with path separators. This is a case. The path separators cannot be parsed on the OS and fails to find jar. {code} // JVM option in launch-container.sh -classpath=local/lib/hadoop # cannot be parsed on Windows {code} For the case, following path is converted into platform depend paths: {code} // JVM option in launch-container.sh -classpath=localFPSlibFPShadoop # can be converted into platform depend paths by expandEnvironments {code} {quote} By minimizing token replacement rules like this, we'd reduce the number of special cases that YARN application writers would need to consider. {quote} Yes, I agree with you. If we don't need it, we shouldn't include the modifications like this. Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support Key: YARN-2929 URL: https://issues.apache.org/jira/browse/YARN-2929 Project: Hadoop YARN Issue Type: Improvement Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2929.001.patch Some frameworks like Spark is tackling to run jobs on Windows(SPARK-1825). For better multiple platform support, we should introduce ApplicationConstants.FILE_PATH_SEPARATOR for making filepath platform-independent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2943) Add a node-labels page in RM web UI
[ https://issues.apache.org/jira/browse/YARN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2943: - Attachment: YARN-2943.2.patch Hi [~jianhe], Thanks for your comments, bq. If I restart NM, “num of active NMs” is incorrect; this is probably because NM is using ephemeral ports; This is what RM can see about NM, after some time elapsed, lost NM will be marked to LOSTED, and will not count in RM side too. bq. “NO_LABEL” - “N/A”, similarly, update the previous nodes page to show “N/A” for not labeled nodes; IMO, N/A is not clear enough. By wikipedia: http://en.wikipedia.org/wiki/N/a, N/a means not available, not applicable or no answer. Here no-label is just a kind of special label, since NO_LABEL has all same characteristics as other normal labels, like exclusivity, etc. bq. usability: num of active NMs link can point to a filtered list of nodes by label ? It already did like what you suggested, when user/admin click the number, it will link to NM page and only shows NMs has that label. And I've addressed rest of your comments. Add a node-labels page in RM web UI --- Key: YARN-2943 URL: https://issues.apache.org/jira/browse/YARN-2943 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: Node-labels-page.png, Nodes-page-with-label-filter.png, YARN-2943.1.patch, YARN-2943.2.patch Now we have node labels in the system, but there's no a very convenient to get information like how many active NM(s) assigned to a given label?, how much total resource for a give label?, For a given label, which queues can access it?, etc. It will be better to add a node-labels page in RM web UI, users/admins can have a centralized view to see such information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243608#comment-14243608 ] Carlo Curino commented on YARN-2009: [~sunilg], pardon the long delay... From what you say it seems like the priority issues within queue it is important for you and you observe non-trivial delays. If that's is the case, I think it is fine to venture in adding within-queue cross-app preemption. I would argue in favor of a conservative policy with several built-in dampers (like max per-round preemptions, deadzones, fraction of imbalance) like we did for cross-queue. Also we should be careful not to make it too expensive (if we have thousands of apps in the queues, we should be mindful of not overloading the RM with costly rebalancing algos, and extra scheduling decisions that derived from preemption). What [~eepayne] says also makes sense, if we preemptions triggered by cross-queue imbalances it would be good to spend them to correct the issue you observed. Priority support for preemption in ProportionalCapacityPreemptionPolicy --- Key: YARN-2009 URL: https://issues.apache.org/jira/browse/YARN-2009 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Devaraj K Assignee: Sunil G While preempting containers based on the queue ideal assignment, we may need to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2356: -- Attachment: 0003-YARN-2356.patch Thank you [~devaraj.k] I have updated the patch against trunk. Kindly check yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 0003-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at $Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1402668848165_0015' doesn't exist in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)