[jira] [Assigned] (YARN-3235) Support uniformed scheduler configuration in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3235: --- Assignee: Naganarasimha G R Support uniformed scheduler configuration in FairScheduler -- Key: YARN-3235 URL: https://issues.apache.org/jira/browse/YARN-3235 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Naganarasimha G R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3235) Support uniformed scheduler configuration in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328829#comment-14328829 ] Naganarasimha G R commented on YARN-3235: - Hi [~wangda] [~kasha] I would like to work on this, issue, hence assigning this issue my to my name, if you guys have any other plan or some body else already working on this please inform... Support uniformed scheduler configuration in FairScheduler -- Key: YARN-3235 URL: https://issues.apache.org/jira/browse/YARN-3235 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Naganarasimha G R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328848#comment-14328848 ] Hudson commented on YARN-933: - SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/844/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328827#comment-14328827 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328826#comment-14328826 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328849#comment-14328849 ] Hudson commented on YARN-3076: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/844/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328853#comment-14328853 ] Sunil G commented on YARN-3226: --- Yes. I also feel we could remove some dead tabs, and then add the new tab called state. Filtering mechanism can be added in a new tab which can just show decommissioned nodes. I will check on this line and work on same line. Thank you. UI changes for decommissioning node --- Key: YARN-3226 URL: https://issues.apache.org/jira/browse/YARN-3226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G Some initial thought is: decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328681#comment-14328681 ] Varun Saxena commented on YARN-3047: Thanks [~sjlee0] for the review. I guess you are ok with other responses to the initial review. [~zjshen], can you have a look as well once you are free ? I will upload a new patch based on comments from both of you. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3195) Add -help to yarn logs and nodes CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3195: Environment: (was: SUSE Linux SP3) Issue Type: Improvement (was: Bug) Add -help to yarn logs and nodes CLI command Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.6.0 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3195) Add -help to yarn logs and nodes CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3195: Summary: Add -help to yarn logs and nodes CLI command (was: [YARN]Missing uniformity In Yarn Queue CLI command) Add -help to yarn logs and nodes CLI command Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328669#comment-14328669 ] Devaraj K commented on YARN-3195: - Thanks [~jagadesh.kiran] for your work, overall patch is ok except the below things to take care. 1. These tests are failing due to the patch changes, can you have a look into these? {code:xml} org.apache.hadoop.yarn.client.cli.TestLogsCLI org.apache.hadoop.yarn.client.cli.TestYarnCLI {code} 2. We need to return the success exit code (i.e. 0) for help case since the help command execution becomes success. {code:xml} +if (args.length 1 || args[0].equals(-help)) { printHelpMessage(printOpts); return -1; } {code} BTW, can you also take care of formatting the newly added code and avoiding the new lines addition when you create a patch. [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328675#comment-14328675 ] Varun Saxena commented on YARN-3197: [~devaraj.k] Agree that making only log associated with Null container completion as DEBUG doesnt seem right So what do you think ? Should I print Unknown container or Non-alive container ? Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328691#comment-14328691 ] Devaraj K commented on YARN-3197: - I would be ok for anyone, better you can add both of them like below. Unknown or non-alive container + containerId + completed with Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328862#comment-14328862 ] Tsuyoshi OZAWA commented on YARN-2820: -- [~zxu] Thank you for updating a patch. 1. Should we create *WithRetries methods for deleteFile/renameFile/createFile/getFileStatus too? Note that we should update replaceFile to use renameFileWithRetires instead of calling fs.rename(srcPath, dstPath) directly: {code} protected void replaceFile(Path srcPath, Path dstPath) throws Exception { if (fs.exists(dstPath)) { deleteFile(dstPath); } else { LOG.info(File doesn't exist. Skip deleting the file + dstPath); } fs.rename(srcPath, dstPath); } {code} 2. Should we create existsWithRetries and use it instead of fs.exists()? 2. Please move *WithRetries methods below the following comment: {code} // FileSystem related code {code} Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328948#comment-14328948 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328949#comment-14328949 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328955#comment-14328955 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328954#comment-14328954 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329001#comment-14329001 ] Jason Lowe commented on YARN-3131: -- bq. I do not think that continuously polling until RUNNING is a good idea. The most common case on a busy cluster is that an app can be submitted at time X but not start running until a long time later. The patch does not cause the client to poll until the job is RUNNING. It polls until the job has progressed past the SUBMITTED state. The SUBMITTED state is a brief transient state before the ACCEPTED state. So the client will wait approximately as long as it does today, and it fixes that flaky submit unit test in Tez. It will not block until the AM is actually running. bq. As I mentioned earlier, I still believe that doing some basic checks in-line in ClientRMService itself and throwing an exception back straight away is probably a better idea than polling for any RUNNING/FAILED state. I agree that a blocking method is much easier on the client, but I don't think this is an easy change to make in the short term. Again I think it requires a major change to the RPC layer and the RM to support server-side asynchronous call handling, otherwise we have to throw an army of threads at the client service to avoid blocking other clients and that has scaling issues. We could probably add an API to the scheduler to do an in-line sanity check on the requested queue (which is a backwards-incompatible change for schedulers not in the Hadoop repo). However there are many other things that could go wrong during submission that take a long time to perform, such as saving the application state and renewing delegation tokens. I'm not sure it's a win if we check for one thing in-line that could go wrong but still have to poll for all the other things that could go wrong. In the end, Tez and other YARN clients need to know if the app was accepted or not. The queue being wrong is just one of the ways the submit could fail. Continuing to poll in the SUBMITTED state also meshes with the thoughts on the SUBMITTED state being something the client probably shouldn't see anyway. See the discussion about NEW_SAVING and SUBMITTED in YARN-3230. Thanks, Chang, for updating the patch. Please investigate the unit test failure, as it looks like it could be related. My only nit on the patch is it would be a bit clearer and more efficient if we used EnumSet constants to capture the set of states we're waiting the app to leave and the set of states that are failed-to-submit states. I suppose another way to solve this problem is to take the approach discussed in YARN-3230 and have the RM not expose the NEW_SAVING and SUBMITTED states to the client -- they would just see NEW. We'd have to leave the states in the enumeration for backwards compatibility, but we'd stop exposing them in app reports. Any thoughts on that [~zjshen] or [~jianhe]? YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329021#comment-14329021 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/CHANGES.txt Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error if it is unable to create LogWriter
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329505#comment-14329505 ] Jason Lowe commented on YARN-3237: -- I would just add it to this patch since it's a similar and very closely related fix. The JIRA title can be changed to something like AppLogAggregatorImpl fails to log error cause. AppLogAggregatorImpl fails to log error if it is unable to create LogWriter --- Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329544#comment-14329544 ] Hadoop QA commented on YARN-3237: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699927/YARN-3237.patch against trunk revision ce5bf92. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6685//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6685//console This message is automatically generated. AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329643#comment-14329643 ] Xuan Gong commented on YARN-3237: - Committed into trunk/branch-2. Thanks, Rushabh AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.7.0 Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329737#comment-14329737 ] Sangjin Lee commented on YARN-3039: --- Thanks [~djp] for the doc! Some high level comments: - I'm also thinking that option 2 might be more feasible, mostly from the standpoint of limiting the risk. Having said that, I haven't followed YARN-913 closely enough to see how close it is... - The service discovery needs to work across all these different modes: NM aux service, standalone per-node daemon, and standalone per-app daemon. That needs to be one of the primary considerations in this. - The failure scenarios need more details in their own right; for this JIRA, I think it is sufficient to see how it may impact the service discovery and design just enough. {quote} We need a perÂapplication logical aggregator for ATS which provides aggregator service in form of REST API to: RM, AM and NMs, {quote} The RM will likely not use the service discovery. For example, for RM to write the app started event, the timeline aggregator may not even be initialized yet. {quote} However, AM container could be reschedule to other node for some reason (container failure, etc.), so we cannot guarantee the two are always together. {quote} If the AM fails and starts in another node, the existing per-app aggregator should be shut down, and started on the new node. In fact, in the aux service setup, that comes most naturally. So I think we should try to keep that as much as possible. {quote} Failure Cases: 3. Aggregator failed (only): {quote} We're talking about the aggregator failing as a standalone daemon, correct? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: Service Binding for applicationaggregator of ATS (draft).pdf Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
Hitesh Shah created YARN-3239: - Summary: WebAppProxy does not support a final tracking url which has query fragments and params Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329741#comment-14329741 ] Hitesh Shah commented on YARN-3239: --- [~jlowe] [~jeagles] Have you come across any cases such as this? WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329744#comment-14329744 ] Sangjin Lee commented on YARN-3166: --- {quote} 1. In TimelineClient, we keep the existing methods that operate on the old data model, but mark them deprecated individually. 2. In TimelineClient, we create the new methods that operate on the new data model. {quote} Just to note the obvious, that would not work if it is a public interface that other code implements. If it is internal, yes, we could evolve the interface that way, but it is a public interface, that is not an option... So for TimelineClient specifically, I think it is true. But as a general rule, I think there could be issues... [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329646#comment-14329646 ] Rushabh S Shah commented on YARN-3237: -- Thanks Xuan for committing. AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.7.0 Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329708#comment-14329708 ] Robert Kanter commented on YARN-2423: - [~vinodkv], as [~zjshen] pointed out earlier, these Java APIs are very tied to the REST APIs. So, if there ends up being a compatibility problem with the Java API, I'd imagine the REST API would have the same problem. And given that the new ATS will still take some time, it would be very useful to make a Java API available in the meantime, even if we have to eventually deprecate it. Even though we're making a new ATS, many users are still using the older one. As [~vanzin] pointed out, this JIRA is a blocker for SPARK-1537; it would make them better able to use the current ATS. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3238: - Attachment: YARN-3238.001.patch Since the IPC layer is already retrying it doesn't make sense to also retry at the YARN layer. Attaching a patch that removes socket connection timeouts from the list of errors we retry at the YARN layer. An alternate approach would be to retry at the YARN layer but explicitly tell the IPC layer to _not_ retry socket timeouts when creating the proxy. This change seemed simpler and is what we've been doing all along before YARN-2613. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated YARN-3237: - Attachment: YARN-3237-v2.patch Attaching a new patch to add error cause to doContainerLogAggregation method. AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329655#comment-14329655 ] Hudson commented on YARN-3237: -- FAILURE: Integrated in Hadoop-trunk-Commit #7169 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7169/]) YARN-3237. AppLogAggregatorImpl fails to log error cause. Contributed by (xgong: rev f56c65bb3eb9436b67de2df63098e26589e70e56) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.7.0 Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329671#comment-14329671 ] Li Lu commented on YARN-3033: - Hi [~sjlee0], thanks for the comments! I thing I used the wrong name Application Level Aggregator inside RM here. I think a more appropriate name would be aggregator inside RM for app data. I agree that we should strictly limit the total number of this type of aggregators (one per RM seems to be reasonable for now). We may want to reuse the implementation for web server/data storage layer in aggregator collection for this aggregator as well, by simply wrap it to an aggregator collection? For the second point, yes, we can (and should always) reuse the same hbase client for all app level aggregators on the same node. [Aggregator wireup] Implement NM starting the ATS writer companion -- Key: YARN-3033 URL: https://issues.apache.org/jira/browse/YARN-3033 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf Per design in YARN-2928, implement node managers starting the ATS writer companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated YARN-3237: - Summary: AppLogAggregatorImpl fails to log error cause (was: AppLogAggregatorImpl fails to log error if it is unable to create LogWriter) AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329636#comment-14329636 ] Xuan Gong commented on YARN-3237: - +1 LGTM. Will commit AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error if it is unable to create LogWriter
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329492#comment-14329492 ] Rushabh S Shah commented on YARN-3237: -- There is 1 more method in AppLogAggregatorImpl#doContainerLogAggregation which doesn't log the exception too. Do I need to create another jira or should I add in this patch ? AppLogAggregatorImpl fails to log error if it is unable to create LogWriter --- Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause
[ https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329570#comment-14329570 ] Hadoop QA commented on YARN-3237: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699931/YARN-3237-v2.patch against trunk revision 8c6ae0d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6686//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6686//console This message is automatically generated. AppLogAggregatorImpl fails to log error cause - Key: YARN-3237 URL: https://issues.apache.org/jira/browse/YARN-3237 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: YARN-3237-v2.patch, YARN-3237.patch AppLogAggregatorImpl fails to log the error if it is unable to create LogWriter. Below is the log output: [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: Cannot create writer for app app_id. Disabling log-aggregation for this app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
Jason Lowe created YARN-3238: Summary: Connection timeouts to nodemanagers are retried at multiple levels Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329748#comment-14329748 ] Sangjin Lee commented on YARN-3033: --- Agreed. Thanks for the clarification. In summary, I just want to make sure that we do not impose an app-level context for the RM's aggregator. [Aggregator wireup] Implement NM starting the ATS writer companion -- Key: YARN-3033 URL: https://issues.apache.org/jira/browse/YARN-3033 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Li Lu Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf Per design in YARN-2928, implement node managers starting the ATS writer companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329949#comment-14329949 ] Sunil G commented on YARN-2986: --- Thank you [~leftnoteasy], this is much awaited ticket :) I have few inputs on same. 1. {noformat} policy-properties resource-calculator org.apache.hadoop.yarn.util.resource.DominantResourceCalculator /resource-calculator /policy-properties {noformat} and {noformat} policy-properties user-limit-factor2/user-limit-factor {noformat} This is inside queue. Do you mean that, non repeating items are kept outside loop of queues and changing items are kept inside each queue? Hoewever if i have only one set userlimit, node labels etc, and if i keep all of those policy-properties outside *queue* section, then will it be applicable for all queues? If not, I suggest we can have policy-property name concept. {noformat} queue name=default stateRUNNING/state acl_submit_applications*/acl_submit_applications acl_administer_queue*/acl_administer_queue accessible-node-labelsx/accessible-node-labels policy-propertiesgpu/policy-properties /queue queue name=queueA stateRUNNING/state acl_submit_applications*/acl_submit_applications acl_administer_queue*/acl_administer_queue accessible-node-labelsx/accessible-node-labels policy-propertiesgpu/policy-properties /queue policy-properties name=gpu user-limit-factor2/user-limit-factor node-labels node-label name=x capacity20/capacity maximum-capacity50/maximum-capacity /node-label /node-labels {noformat} It cab be shared as needed across queues. And will make the queue part more readable. (Umbrella) Support hierarchical and unified scheduler configuration --- Key: YARN-2986 URL: https://issues.apache.org/jira/browse/YARN-2986 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Attachments: YARN-2986.1.patch Today's scheduler configuration is fragmented and non-intuitive, and needs to be improved. Details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
zhihai xu created YARN-3241: --- Summary: Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will think these two queue names as same which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3240) [Data Mode] Implement client API to put generic entities
Zhijie Shen created YARN-3240: - Summary: [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.
[ https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3233: - Attachment: YARN-3233.1.patch Attached ver.1 patch, please share your thoughts, example please refer to {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/test-capacity-scheduler-hierarchy.xml}} in the patch. Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration. - Key: YARN-3233 URL: https://issues.apache.org/jira/browse/YARN-3233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3233.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.
[ https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329967#comment-14329967 ] Hadoop QA commented on YARN-3233: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699989/YARN-3233.1.patch against trunk revision 0d6af57. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6688//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6688//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6688//console This message is automatically generated. Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration. - Key: YARN-3233 URL: https://issues.apache.org/jira/browse/YARN-3233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3233.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329848#comment-14329848 ] Zhijie Shen commented on YARN-3166: --- bq. But as a general rule, I think there could be issues... Yeah, could be, but hopefully it won't be significant. in YARN-3240, I've made patch to add new client APIs into TimelineClient. Taking a close look at TimelineClientImpl, the code of operating on old data model is relative a small piece, while will need to carry over most skeleton code as well as part of the APIs, i.e. DT operations. [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329998#comment-14329998 ] Mit Desai commented on YARN-3238: - +1 (non binding) Looks good to me Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329818#comment-14329818 ] Hadoop QA commented on YARN-3238: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699964/YARN-3238.001.patch against trunk revision f56c65b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6687//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6687//console This message is automatically generated. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3240) [Data Mode] Implement client API to put generic entities
[ https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3240: -- Attachment: YARN-3240.1.patch In this Jira, I aim to make the basic client write APIs ready. Later on, we can add more advanced APIs to simply putting predefined data, such as flow. Bellow is the summary of this patch: 1. Added the two Java methods of TimelineClient to operate the new generic timeline entity data object. One is for blocking call, while the other is for the async call. The two methods wrap over the same REST HTTP API, but giving different async param. At the server side, the aggregator can consume this param to determine whether to use sync or asycn call to persist the timeline data. It needs the resource URI and the context appID to know where the request should be sent to and which conceptual per-app aggregator should be routed to. This is blocked by YARN-3039. For now, I just leave them unset in the code. 2. Change the endpoint at per-node web service accordingly to make the client and the server be paired. 3. One more data object, TimelineEntities, which is the collection of TimelineEntity, is added to host multiple entities for one request to the aggregator. The rationale behind putting the new client API inside the existing timeline client is: 1. Most of TimelineClientImpl code could be reused, including making HTTP calls, security, retry and so on. 2. Not all the client APIs will be deprecated, but just those that operate on the old data model. For example, delegation token related APIs may still stay. [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3240.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90
[ https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329958#comment-14329958 ] zhihai xu commented on YARN-2799: - Thanks [~djp] for valuable feedback and committing the patch! Greatly appreciated cleanup TestLogAggregationService based on the change in YARN-90 Key: YARN-2799 URL: https://issues.apache.org/jira/browse/YARN-2799 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, YARN-2799.002.patch cleanup TestLogAggregationService based on the change in YARN-90. The following code is added to setup in YARN-90, {code} dispatcher = createDispatcher(); appEventHandler = mock(EventHandler.class); dispatcher.register(ApplicationEventType.class, appEventHandler); {code} In this case, we should remove all these code from each test function to avoid duplicate code. Same for dispatcher.stop() which is in tearDown, we can remove dispatcher.stop() from from each test function also because it will always be called from tearDown for each test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-3239: - Assignee: Jian He WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329917#comment-14329917 ] Wangda Tan commented on YARN-2986: -- An update: After an offline discussion with [~vinodkv] and [~jianhe], now proposed configuration file looks like: {code} scheduler typecapacity/type maximum-applications/maximum-applications queue-mappings/queue-mappings queue-mappings-override-enable/queue-mappings-override-enable maximum-am-resource-percent0.3/maximum-am-resource-percent policy-properties resource-calculator org.apache.hadoop.yarn.util.resource.DominantResourceCalculator /resource-calculator /policy-properties queue name=root queues queue name=default stateRUNNING/state acl_submit_applications*/acl_submit_applications acl_administer_queue*/acl_administer_queue accessible-node-labelsx/accessible-node-labels policy-properties user-limit-factor2/user-limit-factor capacity50/capacity maximum-capacity90/maximum-capacity node-locality-delay30/node-locality-delay node-labels node-label name=x capacity20/capacity maximum-capacity50/maximum-capacity /node-label /node-labels /policy-properties /queue /queues /queue /scheduler {code} One highlight of this proposal and previous proposal is: this contains a policy-properties for each configuration node, which means a scheduler-specific configurations, like capacity in CapacityScheduler and minShare in FairScheduler, etc. (policy here means different kinds of scheduling method). For other common options (not belongs to a specific scheduler implementation), should be placed outside of policy-properties. *Please feel free to share your thoughts about this proposal :).* To move this forward, I filed several sub ticket, YARN-3233 is targeted to solve the configuration file (for common scheduler and capacity scheduler) definition and parsing, I will upload a patch right now. YARN-3234 is to solve Capacity Scheduler integration with the new config file. (Umbrella) Support hierarchical and unified scheduler configuration --- Key: YARN-2986 URL: https://issues.apache.org/jira/browse/YARN-2986 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Attachments: YARN-2986.1.patch Today's scheduler configuration is fragmented and non-intuitive, and needs to be improved. Details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3194: - Summary: RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node (was: After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node --- Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM during registration. The registration can be treated by RM as New node or Reconnecting node. RM triggers corresponding event on the basis of node added or node reconnected state. # Node added event : Again here 2 scenario's can occur ## New node is registering with different ip:port – NOT A PROBLEM ## Old node is re-registering because of RESYNC command from RM when RM restart – NOT A PROBLEM # Node reconnected event : ## Existing node is re-registering i.e RM treat it as reconnecting node when RM is not restarted ### NM RESTART NOT Enabled – NOT A PROBLEM ### NM RESTART is Enabled Some applications are running on this node – *Problem is here* Zero applications are running on this node – NOT A PROBLEM Since NMContainerStatus are not handled, RM never get to know about completedContainer and never release resource held be containers. RM will not allocate new containers for pending resource request as long as the completedContainer event is triggered. This results in applications to wait indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329020#comment-14329020 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329012#comment-14329012 ] Jason Lowe commented on YARN-3131: -- Actually just saw YARN-3232 was filed which proposes to stop exposing the NEW_SAVING and SUBMITTED states to clients. If we do that then all we need to do in YarnClientImpl is have it throw when the non-NEW state is FAILED or KILLED to indicate the submit failed. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329084#comment-14329084 ] Junping Du commented on YARN-3039: -- Hi [~rkanter], thanks for sharing your thoughts here. I think as a generic, external service for YARN, YARN-913 may not meet our particular requirements here, like: - timeline service will serve as build-in service, not necessary for application to register service explicitly - NM also need this aggregators info to aggregate info related to containers running on top of it. - We have preference to bind service to local node of AM container - Now, the launching of NM aggregators is not in way of YARN service container (see YARN-3033) Also, I think we may not want this built-in service (as a standalone feature) to depends on another big feature in progress when unnecessary. Thoughts? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329050#comment-14329050 ] Hudson commented on YARN-933: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/]) YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. Contributed by Rohith Sharmaks (jianhe: rev c0d9b93953767608dfe429ddb9bd4c1c3bd3debf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3039: - Attachment: Service Binding for applicationaggregator of ATS (draft).pdf I put some thoughts into a draft proposal here. Welcome everyone for comments! [~rkanter], do you start the work on this JIRA? If not, do you mind I take this JIRA over? Thanks! [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: Service Binding for applicationaggregator of ATS (draft).pdf Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329051#comment-14329051 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329048#comment-14329048 ] Hudson commented on YARN-3194: -- FAILURE: Integrated in Hadoop-trunk-Commit #7162 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7162/]) YARN-3194. RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node. Contributed by Rohith (jlowe: rev a64dd3d24bfcb9af21eb63869924f6482b147fd3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeReconnectEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node --- Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Priority: Blocker Fix For: 2.7.0 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM during registration. The registration can be treated by RM as New node or Reconnecting node. RM triggers corresponding event on the basis of node added or node reconnected state. # Node added event : Again here 2 scenario's can occur ## New node is registering with different ip:port – NOT A PROBLEM ## Old node is re-registering because of RESYNC command from RM when RM restart – NOT A PROBLEM # Node reconnected event : ## Existing node is re-registering i.e RM treat it as reconnecting node when RM is not restarted ### NM RESTART NOT Enabled – NOT A PROBLEM ### NM RESTART is Enabled Some applications are running on this node – *Problem is here* Zero applications are running on this node – NOT A PROBLEM Since NMContainerStatus are not handled, RM never get to know about completedContainer and never release resource held be containers. RM will not allocate new containers for pending resource request as long as the completedContainer event is triggered. This results in applications to wait indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator
[ https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329103#comment-14329103 ] Junping Du commented on YARN-3115: -- Hi [~zjshen], do you already start the work on this? If not, I am quite familiar with NM work preserving in restart. Can I take this JIRA on? Thanks! [Aggregator wireup] Work-preserving restarting of per-node aggregator - Key: YARN-3115 URL: https://issues.apache.org/jira/browse/YARN-3115 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-3030 makes the per-node aggregator work as the aux service of a NM. It contains the states of the per-app aggregators corresponding to the running AM containers on this NM. While NM is restarted in work-preserving mode, this information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3241: Description: Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same which will create Metrics source XXX already exists! MetricsException. was: Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will think these two queue names as same which will create Metrics source XXX already exists! MetricsException. Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3241: Description: Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. was: Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same which will create Metrics source XXX already exists! MetricsException. Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329136#comment-14329136 ] Rohith commented on YARN-3222: -- I see there are 2 ways of fixing the issue. # Always send NODE_RESOURCE_UPDATE event to scheduler via RMNodeEventType.RESOURCE_UPDATE of RMnode # When NODE_ADDED event is sent to scheduler, again sending NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate update request because scheduler has already been updated resources with newly added node i.e NODE_REMOVED-NODE_ADDED--NODE_RESOURCE_UPDATE--. So if NO applications are running in the node, then it is not required to send node_resource_update request. I would prefer for 2nd option because here one duplicate resource update can be optimized. RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order --- Key: YARN-3222 URL: https://issues.apache.org/jira/browse/YARN-3222 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Priority: Critical When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the scheduler in a events node_added,node_removed or node_resource_update. These events should be notified in an sequential order i.e node_added event and next node_resource_update events. But if the node is reconnected with different http port, the oder of scheduler events are node_removed -- node_resource_update -- node_added which causes scheduler does not find the node and throw NPE and RM exit. Node_Resource_update event should be always should be triggered via RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90
[ https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329209#comment-14329209 ] Junping Du commented on YARN-2799: -- +1. Committing it now. cleanup TestLogAggregationService based on the change in YARN-90 Key: YARN-2799 URL: https://issues.apache.org/jira/browse/YARN-2799 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2799.000.patch, YARN-2799.001.patch, YARN-2799.002.patch cleanup TestLogAggregationService based on the change in YARN-90. The following code is added to setup in YARN-90, {code} dispatcher = createDispatcher(); appEventHandler = mock(EventHandler.class); dispatcher.register(ApplicationEventType.class, appEventHandler); {code} In this case, we should remove all these code from each test function to avoid duplicate code. Same for dispatcher.stop() which is in tearDown, we can remove dispatcher.stop() from from each test function also because it will always be called from tearDown for each test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329194#comment-14329194 ] Hadoop QA commented on YARN-2083: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650950/YARN-2083-3.patch against trunk revision a64dd3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6682//console This message is automatically generated. In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: fairscheduler Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3131: --- Attachment: yarn_3131_v2.patch YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3235) Support uniformed scheduler configuration in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329179#comment-14329179 ] Karthik Kambatla commented on YARN-3235: It is all yours, [~Naganarasimha]. Thanks for checking in. Support uniformed scheduler configuration in FairScheduler -- Key: YARN-3235 URL: https://issues.apache.org/jira/browse/YARN-3235 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Naganarasimha G R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90
[ https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329187#comment-14329187 ] Hadoop QA commented on YARN-2799: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698939/YARN-2799.002.patch against trunk revision a64dd3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6681//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6681//console This message is automatically generated. cleanup TestLogAggregationService based on the change in YARN-90 Key: YARN-2799 URL: https://issues.apache.org/jira/browse/YARN-2799 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2799.000.patch, YARN-2799.001.patch, YARN-2799.002.patch cleanup TestLogAggregationService based on the change in YARN-90. The following code is added to setup in YARN-90, {code} dispatcher = createDispatcher(); appEventHandler = mock(EventHandler.class); dispatcher.register(ApplicationEventType.class, appEventHandler); {code} In this case, we should remove all these code from each test function to avoid duplicate code. Same for dispatcher.stop() which is in tearDown, we can remove dispatcher.stop() from from each test function also because it will always be called from tearDown for each test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90
[ https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329224#comment-14329224 ] Hudson commented on YARN-2799: -- FAILURE: Integrated in Hadoop-trunk-Commit #7163 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7163/]) YARN-2799. Cleanup TestLogAggregationService based on the change in YARN-90. Contributed by Zhihai Xu (junping_du: rev c33ae271c24f0770c9735ccd2086cafda4f4e0b2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt cleanup TestLogAggregationService based on the change in YARN-90 Key: YARN-2799 URL: https://issues.apache.org/jira/browse/YARN-2799 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, YARN-2799.002.patch cleanup TestLogAggregationService based on the change in YARN-90. The following code is added to setup in YARN-90, {code} dispatcher = createDispatcher(); appEventHandler = mock(EventHandler.class); dispatcher.register(ApplicationEventType.class, appEventHandler); {code} In this case, we should remove all these code from each test function to avoid duplicate code. Same for dispatcher.stop() which is in tearDown, we can remove dispatcher.stop() from from each test function also because it will always be called from tearDown for each test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2083: --- Labels: fairscheduler (was: assignContainer fair scheduler) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: fairscheduler Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329225#comment-14329225 ] Hudson commented on YARN-90: FAILURE: Integrated in Hadoop-trunk-Commit #7163 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7163/]) YARN-2799. Cleanup TestLogAggregationService based on the change in YARN-90. Contributed by Zhihai Xu (junping_du: rev c33ae271c24f0770c9735ccd2086cafda4f4e0b2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt NodeManager should identify failed disks becoming good again Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.10.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, apache-yarn-90.8.patch, apache-yarn-90.9.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329598#comment-14329598 ] Zhijie Shen commented on YARN-3166: --- bq. In such a scenario if we try to Reuse REST then we might end up with lot of If else... I'm not sure if I understand your question correction, but I think we may not end up with lot of if/else. The proposal is: 1. In TimelineClient, we keep the existing methods that operate on the old data model, but mark them deprecated individually. 2. In TimelineClient, we create the new methods that operate on the new data model. The methods of operating new and old data models are separated. [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329275#comment-14329275 ] Junping Du commented on YARN-3223: -- Hi [~varun_saxena], I am glad if you can work on this JIRA. :) My recommendation here just you can check some sub-tasks under YARN-291 which is related to the work here. Resource update during NM graceful decommission --- Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du Assignee: Varun Saxena During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329324#comment-14329324 ] Wangda Tan commented on YARN-3230: -- Committing.. Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, YARN-3230.3.patch, application page.png Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329334#comment-14329334 ] Hudson commented on YARN-3230: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7164 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7164/]) YARN-3230. Clarify application states on the web UI. (Jian He via wangda) (wangda: rev ce5bf927c3d9f212798de1bf8706e5e9def235a1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, YARN-3230.3.patch, application page.png Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3197: --- Attachment: YARN-3197.002.patch Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator
[ https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329299#comment-14329299 ] Zhijie Shen commented on YARN-3115: --- Please feel free to take it over. [Aggregator wireup] Work-preserving restarting of per-node aggregator - Key: YARN-3115 URL: https://issues.apache.org/jira/browse/YARN-3115 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-3030 makes the per-node aggregator work as the aux service of a NM. It contains the states of the per-app aggregators corresponding to the running AM containers on this NM. While NM is restarted in work-preserving mode, this information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329308#comment-14329308 ] Hadoop QA commented on YARN-3131: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699905/yarn_3131_v2.patch against trunk revision c33ae27. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6683//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6683//console This message is automatically generated. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329356#comment-14329356 ] Vinod Kumar Vavilapalli commented on YARN-2423: --- Apologies for coming in late, but should we rewrite this to be inline with YARN-2928? Given YARN-2928, I don't see an immediate value in having the patch as is (with a new API that nobody uses in the interim) and then rewrite it once more for YARN-2928. I know considerable effort has been spent on this, but what do you folks think? TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329358#comment-14329358 ] Wangda Tan commented on YARN-1963: -- That's great! Thanks. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329368#comment-14329368 ] Zhijie Shen commented on YARN-3031: --- [~vrushalic], YARN-3041 is committed, would you mind updating this patch accordingly? And we'd like to review it again. Thanks! [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329375#comment-14329375 ] Vrushali C commented on YARN-3031: -- Yes, I will.. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2495: - Target Version/s: 2.8.0 (was: 2.6.0) Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329380#comment-14329380 ] Chang Li commented on YARN-3131: Updated patch, fixed unit test and improved the code style. Thanks both [~jlowe] and [~hitesh] for providing valuable opinions. [~hitesh]We could open new jira(s) to address the issue of confirming AM gets launched. The current fix could be a short term solution because transition from submit state to accept state won't take too long as explained by Jason. We are not polling for Running/Failed state. It's not optimized but the situations of submitting work to a incorrect queue aren't common either. [~zjshen][~jianhe]Could you please kindly also comment on this problem? Thanks. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329393#comment-14329393 ] Naganarasimha G R commented on YARN-3166: - Hi [~zjshen] [~sjlee0] bq. I'm not aware that RM has something similar to NM aux service to decouple RM and aggregator, too. +1 for this approach, even i was thinking in the same lines and actually i had raised jira for AUX support in RM for some other functionality in YARN - 2267. bq. The benefit is that we can reuse the whole skeleton, including http rest wrapper, security and retry code, but just need to handle some different data objects, and direct the request to a different location. Instead of deprecate the whole class, we can deprecate the individual methods. Thought? One problem i can think of, as per Sangjin's comments still we remove ATSV1 we need to support both to work based on configurations enable either of one. In such a scenario if we try to Reuse REST then we might end up with lot of If else... [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329410#comment-14329410 ] Hadoop QA commented on YARN-3197: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699909/YARN-3197.002.patch against trunk revision c33ae27. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6684//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6684//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6684//console This message is automatically generated. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329359#comment-14329359 ] Wangda Tan commented on YARN-1963: -- That's great! Thanks. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329357#comment-14329357 ] Wangda Tan commented on YARN-1963: -- That's great! Thanks. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-3003. -- Resolution: Duplicate [~varun_saxena], Thanks for reminding, I just reopen and then resolved as duplicated, since patch of this JIRA is divided to other two JIRAs, there's no code actually committed for this one. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch, YARN-3003.002.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reopened YARN-3003: -- Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch, YARN-3003.002.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329376#comment-14329376 ] Naganarasimha G R commented on YARN-2495: - Hi [~wangda], Thanks for discussing and summarizing this, few points to discuss: # YARN-2923 (configuration based NodeLabelProviderService), what is the scope of this jira as its again one more classification under {{distributed}} ? whether is it req? # Apart from the Vinod's suggestion, i would like to add one more change : In some jiras we were discussing to have fail fast approach i.e. in our case if we configure script based distributed configurations and the script file doesn't exist, rights not sufficient etc.. then NM should fail to start. # yarn-2980 is mostly just moving NodeHealthScriptRunner to HadoopCommon and the code in it differs a lot than what we req, if you remember earlier i had refactored that code and make it reusable for our scenario but you had suggested not to touch the existing NodeHealthScriptRunner and to make ScriptBasedNodeLabelsProvider similar NodeHealthScript serving the needs of Node labels. # Earlier target version was mentioned as 2.6, do we modify it to 2.7 or 2.8? Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329399#comment-14329399 ] Wangda Tan commented on YARN-2495: -- For your questions: 1. It's not first priority, let's just keep it here and move on with the script one 2. That's make sense, and could you take a look at the health check script, is it as same as you mentioned? (I guess so) 3. I can remember that :), make sense to me. If there's any merge needed, we can tackle it in a separated JIRA. 4. I just marked to 2.8, since 2.7 is planned to be released within 2 weeks, I think it will be hard to fit in that timeframe. IIRC, this JIRA contains a abstract node label provider only. I suggest don't add more options for configuring node label provider for now (saying, set provider=script-based-provider), to avoid potentially unnecessary switches. For now, as we discussed, you can leave an empty provider implementation (but need proper impl to do tests), script-based-provider can be addressed separately. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329425#comment-14329425 ] Wangda Tan commented on YARN-3197: -- I think non-alive container is not correct, since all completed containers reported from NM are non-alive, I suggest better saying {{containerId=xx completed with status=yyy from completed or unknown application id=zzz}}. And I suggest to improve following log a little bit as I suggested: https://issues.apache.org/jira/browse/YARN-3197?focusedCommentId=14326344page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326344 {{If a RM can get RMContainer, the application will definitely not unknown, should indicate the application may be completed as well.}} Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329856#comment-14329856 ] Wangda Tan commented on YARN-3131: -- Hi [~lichangleo], Thanks for working on this, some (minor) comments: 1) There're several duplicated calls of {{getApplicationReport}} in YarnClientImpl, you can merge them you a single one. 2) Changes of TestApplicationClientProtocolOnHA actually changed behavior of the test, it's better to fix the test. 3) Code style, I noticed tabs/spaces mixed using in the patch. Wangda YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329921#comment-14329921 ] Wangda Tan commented on YARN-2986: -- Ver.1 patch uploaded to YARN-3233 (Umbrella) Support hierarchical and unified scheduler configuration --- Key: YARN-2986 URL: https://issues.apache.org/jira/browse/YARN-2986 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Attachments: YARN-2986.1.patch Today's scheduler configuration is fragmented and non-intuitive, and needs to be improved. Details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329431#comment-14329431 ] Naganarasimha G R commented on YARN-2854: - Hi [~zjshen], Based on earlier discussions you had mentioned that we need to target this jira by 2.7 and in community forum, discussions were happening that in a week or two we will be having 2.7 release. So can review this doc update so that we can have sufficient time for review rework ? The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329435#comment-14329435 ] Varun Saxena commented on YARN-3197: Hmm...Non-alive was to indicate that its not found in aliveContainers in SchedulerApplicationAttempt. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329436#comment-14329436 ] Marcelo Vanzin commented on YARN-2423: -- Hey everybody, Just wanted to point out that this bug is currently marked as a blocker for integration between Spark and the ATS. It would really great to avoid having to write our own REST client just to talk to the ATS, and if the same API can be used to support YARN-2928, even better. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329442#comment-14329442 ] Li Lu commented on YARN-3166: - Hello guys, sorry for the late reply... bq. RM and NM modules will depend on timeline service module? I agree that for RM, this is almost unavoidable. I think it's reasonable to let RM/NMs (in future) depend on timeline services, but we need to be careful on cyclic dependencies? bq. What is the difference between TimelineStorage and TimelineStorageImpl? We may need some renaming here. In my original thought, TimelineStorage class translates operations based on our object model to data storage layer method calls. These methods should be implemented by a TimelineStorageImpl object (and its subclasses, of course). [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329441#comment-14329441 ] Wangda Tan commented on YARN-3197: -- If so, you should indicate this container cannot be found in aliveContainers of SchedulerApplicationAttempt to be more clear. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)