[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2347: - Attachment: YARN-2347-v2.patch The findbug issue is not related to this patch. However, fix it in v2 patch. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072946#comment-14072946 ] Hadoop QA commented on YARN-2347: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657562/YARN-2347-v2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4411//console This message is automatically generated. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2172) Suspend/Resume Hadoop Jobs
[ https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Chen updated YARN-2172: --- Attachment: Hadoop Job Suspend Resume Design.docx Design Document for Hadoop Job Suspend/Resume Implementation Suspend/Resume Hadoop Jobs -- Key: YARN-2172 URL: https://issues.apache.org/jira/browse/YARN-2172 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager, webapp Affects Versions: 2.2.0 Environment: CentOS 6.5, Hadoop 2.2.0 Reporter: Richard Chen Labels: hadoop, jobs, resume, suspend Fix For: 2.2.0 Attachments: Hadoop Job Suspend Resume Design.docx Original Estimate: 336h Remaining Estimate: 336h In a multi-application cluster environment, jobs running inside Hadoop YARN may be of lower-priority than jobs running outside Hadoop YARN like HBase. To give way to other higher-priority jobs inside Hadoop, a user or some cluster-level resource scheduling service should be able to suspend and/or resume some particular jobs within Hadoop YARN. When target jobs inside Hadoop are suspended, those already allocated and running task containers will continue to run until their completion or active preemption by other ways. But no more new containers would be allocated to the target jobs. In contrast, when suspended jobs are put into resume mode, they will continue to run from the previous job progress and have new task containers allocated to complete the rest of the jobs. My team has completed its implementation and our tests showed it works in a rather solid way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Attachment: (was: YARN-1063.5.patch) Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Attachment: YARN-1063.5.patch I have reloaded patch .5. The previous upload had a whitespace diff that prevented apply to trunk. I had fixed my local branch to remove the ws only diffs from trunk and re-created patch .5. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.5.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072986#comment-14072986 ] Akira AJISAKA commented on YARN-2336: - The patch looks mostly good. I built a pseudo-distributed cluster and verified the JSON response. Some minor comments: {code} FSLeafQueue leaf1 = queueManager.getLeafQueue(root.q.subqueue1, true); FSLeafQueue leaf2 = queueManager.getLeafQueue(root.q.subqueue2, true); {code} In the test, the above code is only to create LeafQueue and leaf1 and leaf2 are unused, so I think it's better to comment that as follows: {code} // create LeafQueue queueManager.getLeafQueue(root.q.subqueue1, true); queueManager.getLeafQueue(root.q.subqueue2, true); {code} {code} public void testClusterSchedulerWithSubQueues() throws JSONException, Exception { {code} Would you render the line within 80 characters? In addition, would you please remove unused import from FairSchedulerQueueInfo.java? Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API
[ https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072990#comment-14072990 ] Akira AJISAKA commented on YARN-1050: - [~kj-ki], thank you for filing a JIRA and creating a patch for the issue. Committers, please review YARN-2336 first. The patch needs to be updated after YARN-2336 is committed. Document the Fair Scheduler REST API Key: YARN-1050 URL: https://issues.apache.org/jira/browse/YARN-1050 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Sandy Ryza Assignee: Kenji Kikushima Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch The documentation should be placed here along with the Capacity Scheduler documentation: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073008#comment-14073008 ] Hadoop QA commented on YARN-1063: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657572/YARN-1063.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1259 javac compiler warnings (more than the trunk's current 1258 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ipc.TestIPC {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4412//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4412//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4412//console This message is automatically generated. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.5.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token -
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073030#comment-14073030 ] Tsuyoshi OZAWA commented on YARN-2229: -- [~sseth], thanks for your comment. [~jianhe], [~zjshen], after reading the comment by Zhijie, I think [first design|https://issues.apache.org/jira/browse/YARN-2229] looks better because of cluster-level backward compatibility. Can we agree with going on the first design? ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073031#comment-14073031 ] Tsuyoshi OZAWA commented on YARN-2313: -- [~kkambatl], thank you for your suggestion. It sounds reasonable and good to me. I'll open new JIRA to address maintenance thread. Livelock can occur in FairScheduler when there are lots of running apps --- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, YARN-2313.4.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Attachment: YARN-1063.6.patch patch .6 fixes the extra warning. the IPC test failure I believe is infra related, not patch related. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073094#comment-14073094 ] Hudson commented on YARN-1342: -- FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/622/]) YARN-1342. Recover container tokens upon nodemanager restart. Contributed by Jason Lowe. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java Recover container tokens upon nodemanager restart - Key: YARN-1342 URL: https://issues.apache.org/jira/browse/YARN-1342 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.6.0 Attachments: YARN-1342.patch, YARN-1342v2.patch, YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, YARN-1342v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps
[ https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073100#comment-14073100 ] Hudson commented on YARN-2300: -- FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/622/]) YARN-2300. Improved the documentation of the sample requests for RM REST API - submitting an app. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Document better sample requests for RM web services for submitting apps --- Key: YARN-2300 URL: https://issues.apache.org/jira/browse/YARN-2300 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.5.0 Attachments: apache-yarn-2300.0.patch The documentation for RM web services should provide better examples for app submission. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073098#comment-14073098 ] Hudson commented on YARN-2147: -- FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/622/]) YARN-2147. client lacks delegation token exception details when application submit fails. Contributed by Chen He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2347: - Attachment: YARN-2347-v3.patch Sync patch with latest trunk. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073115#comment-14073115 ] Junping Du commented on YARN-1354: -- Just like [~devaraj.k]'s above comments, [~jlowe], would you like to sync the patch to latest trunk given many related patches get committed recently? Thx! Recover applications upon nodemanager restart - Key: YARN-1354 URL: https://issues.apache.org/jira/browse/YARN-1354 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1354-v1.patch, YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch The set of active applications in the nodemanager context need to be recovered for work-preserving nodemanager restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073123#comment-14073123 ] Hadoop QA commented on YARN-1063: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657587/YARN-1063.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ipc.TestIPC {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4413//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4413//console This message is automatically generated. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073143#comment-14073143 ] Remus Rusanu commented on YARN-1063: TestIPC.testRetryProxy passes for me locally with the patch applied. The test does not exercise in any way the winutils. Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Environment: Windows Reporter: Kyle Leckie Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch h1. Summary: Securing a Hadoop cluster requires constructing some form of security boundary around the processes executed in YARN containers. Isolation based on Windows user isolation seems most feasible. This approach is similar to the approach taken by the existing LinuxContainerExecutor. The current patch to winutils.exe adds the ability to create a process as a domain user. h1. Alternative Methods considered: h2. Process rights limited by security token restriction: On Windows access decisions are made by examining the security token of a process. It is possible to spawn a process with a restricted security token. Any of the rights granted by SIDs of the default token may be restricted. It is possible to see this in action by examining the security tone of a sandboxed process launch be a web browser. Typically the launched process will have a fully restricted token and need to access machine resources through a dedicated broker process that enforces a custom security policy. This broker process mechanism would break compatibility with the typical Hadoop container process. The Container process must be able to utilize standard function calls for disk and network IO. I performed some work looking at ways to ACL the local files to the specific launched without granting rights to other processes launched on the same machine but found this to be an overly complex solution. h2. Relying on APP containers: Recent versions of windows have the ability to launch processes within an isolated container. Application containers are supported for execution of WinRT based executables. This method was ruled out due to the lack of official support for standard windows APIs. At some point in the future windows may support functionality similar to BSD jails or Linux containers, at that point support for containers should be added. h1. Create As User Feature Description: h2. Usage: A new sub command was added to the set of task commands. Here is the syntax: winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] Some notes: * The username specified is in the format of user@domain * The machine executing this command must be joined to the domain of the user specified * The domain controller must allow the account executing the command access to the user information. For this join the account to the predefined group labeled Pre-Windows 2000 Compatible Access * The account running the command must have several rights on the local machine. These can be managed manually using secpol.msc: ** Act as part of the operating system - SE_TCB_NAME ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME * The launched process will not have rights to the desktop so will not be able to display any information or create UI. * The launched process will have no network credentials. Any access of network resources that requires domain authentication will fail. h2. Implementation: Winutils performs the following steps: # Enable the required privileges for the current process. # Register as a trusted process with the Local Security Authority (LSA). # Create a new logon for the user passed on the command line. # Load/Create a profile on the local machine for the new logon. # Create a new environment for the new logon. # Launch the new process in a job with the task name specified and using the created logon. # Wait for the JOB to exit. h2. Future work: The following work was scoped out of this check in: * Support for non-domain users or machine that are not domain joined. * Support for privilege isolation by running the task launcher in a high privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2349) InvalidStateTransitonException after RM switch
[ https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073158#comment-14073158 ] Rohith commented on YARN-2349: -- This is basically configurations in capacity-scheduler.xml of both RM's does not match. During recovery application is moved New-ACCEPTED synchronously by adding application to scheduler. Before scheduler knows about appilcation,RMAppImpl is moved to ACCEPTED. Any exception(for serveral reason) during submitApplication,APP_REJECTED event is triggered which inturn cause InvaliStateTransition. For fixing it, either enfource both RM's configuration should be same adding note OR handle APP_REJECTED event at ACCEPTED state. InvalidStateTransitonException after RM switch -- Key: YARN-2349 URL: https://issues.apache.org/jira/browse/YARN-2349 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty {code} 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45018: starting 2014-07-23 19:22:28,266 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_REJECTED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped SelectChannelConnector@10.18.40.84:45020 2014-07-23 19:22:28,291 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Error when openning history file of application application_1406116264351_0007 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073161#comment-14073161 ] Hadoop QA commented on YARN-2347: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657589/YARN-2347-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4414//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4414//console This message is automatically generated. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073209#comment-14073209 ] Hudson commented on YARN-2147: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/]) YARN-2147. client lacks delegation token exception details when application submit fails. Contributed by Chen He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps
[ https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073211#comment-14073211 ] Hudson commented on YARN-2300: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/]) YARN-2300. Improved the documentation of the sample requests for RM REST API - submitting an app. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Document better sample requests for RM web services for submitting apps --- Key: YARN-2300 URL: https://issues.apache.org/jira/browse/YARN-2300 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.5.0 Attachments: apache-yarn-2300.0.patch The documentation for RM web services should provide better examples for app submission. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073205#comment-14073205 ] Hudson commented on YARN-1342: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/]) YARN-1342. Recover container tokens upon nodemanager restart. Contributed by Jason Lowe. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java Recover container tokens upon nodemanager restart - Key: YARN-1342 URL: https://issues.apache.org/jira/browse/YARN-1342 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.6.0 Attachments: YARN-1342.patch, YARN-1342v2.patch, YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, YARN-1342v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception
Ted Yu created YARN-2350: Summary: TestApplicationMasterServiceOnHA fails with InvalidToken exception Key: YARN-2350 URL: https://issues.apache.org/jira/browse/YARN-2350 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Yarn-trunk/622 : {code} Running org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.591 sec FAILURE! - in org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA testAllocateOnHA(org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA) Time elapsed: 8.408 sec ERROR! org.apache.hadoop.security.token.SecretManager$InvalidToken: Given AMRMToken for application : appattempt_1000_0001_00 seems to have been generated illegally. at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy85.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy86.allocate(Unknown Source) at org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA.testAllocateOnHA(TestApplicationMasterServiceOnHA.java:84) {code} This is reproducible locally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073218#comment-14073218 ] Allen Wittenauer commented on YARN-2348: -1 as written. Properly set up servers typically have their time set to UTC. Changing the display here will conflict with what is in the log files. If you want to display a different locale on the Web UI, then it needs to be selectable. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073218#comment-14073218 ] Allen Wittenauer edited comment on YARN-2348 at 7/24/14 2:07 PM: - -1 Properly set up servers typically have their time set to UTC. Changing the display here will conflict with what is in the log files. If you want to display a different locale on the Web UI, then it needs to be selectable. was (Author: aw): -1 as written. Properly set up servers typically have their time set to UTC. Changing the display here will conflict with what is in the log files. If you want to display a different locale on the Web UI, then it needs to be selectable. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073263#comment-14073263 ] Hudson commented on YARN-1342: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/]) YARN-1342. Recover container tokens upon nodemanager restart. Contributed by Jason Lowe. (devaraj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java Recover container tokens upon nodemanager restart - Key: YARN-1342 URL: https://issues.apache.org/jira/browse/YARN-1342 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.6.0 Attachments: YARN-1342.patch, YARN-1342v2.patch, YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, YARN-1342v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps
[ https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073269#comment-14073269 ] Hudson commented on YARN-2300: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/]) YARN-2300. Improved the documentation of the sample requests for RM REST API - submitting an app. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Document better sample requests for RM web services for submitting apps --- Key: YARN-2300 URL: https://issues.apache.org/jira/browse/YARN-2300 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.5.0 Attachments: apache-yarn-2300.0.patch The documentation for RM web services should provide better examples for app submission. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073267#comment-14073267 ] Hudson commented on YARN-2147: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/]) YARN-2147. client lacks delegation token exception details when application submit fails. Contributed by Chen He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2351) YARN CLI should provide a command to list the configurations in use
Zhijie Shen created YARN-2351: - Summary: YARN CLI should provide a command to list the configurations in use Key: YARN-2351 URL: https://issues.apache.org/jira/browse/YARN-2351 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen To more easily understand the expected behavior of a yarn component, it is good have the command line to be able to print the configurations in use for RM, NM and timeline server daemons, as what we can do now via the web interfaces: {code} http://RM|NM|Timeline host:port/conf {code} The command line could be something like: {code} yarn conf resourcemanager|nodemanager|timelineserver [host] {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073324#comment-14073324 ] Allen Wittenauer commented on YARN-2351: hdfs already has getconf, so this should be an analog and/or expansion of that command for consistency. YARN CLI should provide a command to list the configurations in use --- Key: YARN-2351 URL: https://issues.apache.org/jira/browse/YARN-2351 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen To more easily understand the expected behavior of a yarn component, it is good have the command line to be able to print the configurations in use for RM, NM and timeline server daemons, as what we can do now via the web interfaces: {code} http://RM|NM|Timeline host:port/conf {code} The command line could be something like: {code} yarn conf resourcemanager|nodemanager|timelineserver [host] {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073328#comment-14073328 ] Alejandro Abdelnur commented on YARN-2348: -- Allen suggestion of making selectable from the browser makes sense. In Oozie, we are doing this. Because JavaScript does not have built in libraries for TZ handling, what we did is: * have request parameter that specifies the desired TZ for datetime values, default value is UTC. * TZ conversion happens on the server side when producing the JSON output using the TZ request parameter. * have a REST call that returns the list of available TZ. * have a dropdown in the UI that shows the available TZs (uses the rest call from previous bullet) * use a cookie to remember the user selected TZ * if the cookie is present, set the TZ request parameter with it. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073334#comment-14073334 ] Zhijie Shen commented on YARN-2351: --- Noticed that. Agree that we can do the similar thing for YARN YARN CLI should provide a command to list the configurations in use --- Key: YARN-2351 URL: https://issues.apache.org/jira/browse/YARN-2351 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen To more easily understand the expected behavior of a yarn component, it is good have the command line to be able to print the configurations in use for RM, NM and timeline server daemons, as what we can do now via the web interfaces: {code} http://RM|NM|Timeline host:port/conf {code} The command line could be something like: {code} yarn conf resourcemanager|nodemanager|timelineserver [host] {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073348#comment-14073348 ] Allen Wittenauer commented on YARN-2351: The big thing is consistency... so you're getting yarn getconf as the subcommand. :) YARN CLI should provide a command to list the configurations in use --- Key: YARN-2351 URL: https://issues.apache.org/jira/browse/YARN-2351 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen To more easily understand the expected behavior of a yarn component, it is good have the command line to be able to print the configurations in use for RM, NM and timeline server daemons, as what we can do now via the web interfaces: {code} http://RM|NM|Timeline host:port/conf {code} The command line could be something like: {code} yarn conf resourcemanager|nodemanager|timelineserver [host] {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073455#comment-14073455 ] Karthik Kambatla commented on YARN-2313: [~ozawa] - thanks. I have started looking at it and we can do it on YARN-2328. I hope you haven't also started working on it. Livelock can occur in FairScheduler when there are lots of running apps --- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, YARN-2313.4.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
[ https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073467#comment-14073467 ] Karthik Kambatla commented on YARN-2352: Once this is done, we can look into getting statistics over a sliding window - either in the number of calls or time. FairScheduler: Collect metrics on duration of critical methods that affect performance -- Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
Karthik Kambatla created YARN-2352: -- Summary: FairScheduler: Collect metrics on duration of critical methods that affect performance Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2353) FairScheduler: Update demand asynchronously instead of in the Update Thread
Karthik Kambatla created YARN-2353: -- Summary: FairScheduler: Update demand asynchronously instead of in the Update Thread Key: YARN-2353 URL: https://issues.apache.org/jira/browse/YARN-2353 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li reassigned YARN-2308: -- Assignee: chang li NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073550#comment-14073550 ] Jian He commented on YARN-2211: --- - This code not needed? may remove the newInstance() method also {code} AMRMTokenSecretManagerState amrmTokenSecretManagerState = AMRMTokenSecretManagerState.newInstance(); {code} - currentKey will never be null ? if so, we can remove the check. {code} if (currentKey != null) { this.currentMasterKey = new MasterKeyData(currentKey, createSecretKey(currentKey.getBytes() .array())); } if (currentMasterKey != null {code} - Instead of moving the following to yarn_proto, we should probably have a separate jira to move all the RM recovery related records to resource manager module. For now, I think we can create a new proto file and move amrm token state there.{code} message MasterKeyProto { optional int32 key_id = 1; optional bytes bytes = 2; }{code} RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts
Jian He created YARN-2354: - Summary: DistributedShell may allocate more containers than client specified after it restarts Key: YARN-2354 URL: https://issues.apache.org/jira/browse/YARN-2354 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He To reproduce, run distributed shell with -num_containers option, In ApplicationMaster.java, the following code has some issue. {code} int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); for (int i = 0; i numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); } numRequestedContainers.set(numTotalContainersToRequest); {code} numRequestedContainers doesn't account for previous AM's requested containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073574#comment-14073574 ] chang li commented on YARN-2308: I am working on it NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073577#comment-14073577 ] Xuan Gong commented on YARN-2211: - bq. This code not needed? may remove the newInstance() method also It is used in RMStateStore initiation. bq. currentKey will never be null ? if so, we can remove the check. Might need to keep the NULL check. If the RM is start from the brand new state, there are no states at all. So, the currentKey is NULL. bq. Instead of moving the following to yarn_proto, we should probably have a separate jira to move all the RM recovery related records to resource manager module. For now, I think we can create a new proto file and move amrm token state there. We can do that. RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1336) Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1336: - Attachment: YARN-1336-rollup-v2.patch Refreshing the rollup patch to latest trunk so it's easier for people to play with the feature and get a general sense of things before the rest of the patches are integrated. Notable fixes since the last rollup patch include fixing container reacquisition and avoiding deleting log directories on NM teardown when we're restarting. Work-preserving nodemanager restart --- Key: YARN-1336 URL: https://issues.apache.org/jira/browse/YARN-1336 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, YARN-1336-rollup.patch This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped
[ https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073687#comment-14073687 ] Karthik Kambatla commented on YARN-2328: As per the discussion on YARN-2313, it might be better to have a single thread run all background tasks so that we guarantee the foreground tasks have a dedicated chunk of time with no thread holding a lock on the FairScheduler which is the smallest of these tasks' periods. FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped Key: YARN-2328 URL: https://issues.apache.org/jira/browse/YARN-2328 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch FairScheduler threads can use a little cleanup and tests. To begin with, the update and continuous-scheduling threads should extend Thread and handle being interrupted. We should have tests for starting and stopping them as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073727#comment-14073727 ] Mayank Bansal commented on YARN-2069: - Thanks [~vinodkv] for the review. I have changed the patch based on the targated capacity for the queue. It balances out with the users resources. I also removed the twp passes and now its only one pass. Please review it. Thanks, Mayank CS queue level preemption should respect user-limits Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch This is different from (even if related to, and likely share code with) YARN-2113. YARN-2113 focuses on making sure that even if queue has its guaranteed capacity, it's individual users are treated in-line with their limits irrespective of when they join in. This JIRA is about respecting user-limits while preempting containers to balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2069: Attachment: YARN-2069-trunk-6.patch CS queue level preemption should respect user-limits Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch This is different from (even if related to, and likely share code with) YARN-2113. YARN-2113 focuses on making sure that even if queue has its guaranteed capacity, it's individual users are treated in-line with their limits irrespective of when they join in. This JIRA is about respecting user-limits while preempting containers to balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1354: - Attachment: YARN-1354-v4.patch Thanks for the interest, Devaraj and Junping! I updated patch to trunk. Recover applications upon nodemanager restart - Key: YARN-1354 URL: https://issues.apache.org/jira/browse/YARN-1354 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1354-v1.patch, YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, YARN-1354-v4.patch The set of active applications in the nodemanager context need to be recovered for work-preserving nodemanager restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts
[ https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-2354: --- Assignee: Li Lu DistributedShell may allocate more containers than client specified after it restarts - Key: YARN-2354 URL: https://issues.apache.org/jira/browse/YARN-2354 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Li Lu To reproduce, run distributed shell with -num_containers option, In ApplicationMaster.java, the following code has some issue. {code} int numTotalContainersToRequest = numTotalContainers - previousAMRunningContainers.size(); for (int i = 0; i numTotalContainersToRequest; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amRMClient.addContainerRequest(containerAsk); } numRequestedContainers.set(numTotalContainersToRequest); {code} numRequestedContainers doesn't account for previous AM's requested containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-415: Attachment: YARN-415.201407242148.txt [~leftnoteasy] Thank you very much for helping me review this. {quote} 1) RMAppAttemptImpl.java 1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them? Like {code} RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition()) - + {code} {quote} Changes completed. {quote} 1.2 getResourceUtilization: {code} +if (rmApps != null) { + RMApp app = rmApps.get(attemptId.getApplicationId()); + if (app != null) { {code} I think the two cannot happen, we don't need check null to avoid potential bug here {quote} Changes completed. {quote} {code} + ApplicationResourceUsageReport appResUsageRpt = {code} It's better to name it appResUsageReport since rpt is not a common abbr of report. {quote} Changes completed. {quote} 2) RMContainerImpl.java 2.1 updateAttemptMetrics: {code} if (rmApps != null) { RMApp rmApp = rmApps.get(container.getApplicationAttemptId().getApplicationId()); if (rmApp != null) { {code} Again, I think the two null check is unnecessary {quote} I was able to remove the rmApps variable, but I had to leave the check for {{app != null}} because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps(). {quote} 3) SchedulerApplicationAttempt.java 3.1 Some rename suggestions: (Please let me know if you have better idea) CACHE_MILLI - MEMORY_UTILIZATION_CACHE_MILLISECONDS lastTime - lastMemoryUtilizationUpdateTime cachedMemorySeconds - lastMemorySeconds same for cachedVCore ... {quote} Changes complete. {quote} 4) AppBlock.java Should we rename Resource Seconds: to Resource Utilization or something? {quote} I changed it as you suggested. It feels like there should be something that would describe it better, but I can't think of anything right now. {quote} 5) Test 5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt. It can consist submit an application, launch several containers, and finish application. And it's better to make the launched application contains several application attempt. While the application running, there're muliple containers running, and multiple containers finished. We can check if total resource utilization are expected. {quote} I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it :-) {quote} bq. One thing I did notice when these values are cached is that there is a race where containers can get counted twice: I think this can not be avoid, it should be a transient state and Jian He and I discussed about this long time before. But apparently, 3 sec cache make it not only a transient state. I suggest you can make lastTime in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in completedContainer method). You can set lastTime to a negtive value like -1, and next time when trying to get accumulated resource utilization, it will recompute all container utilization. {quote} I made the changes to {{lastTime}} as you suggested. I agree that there will always be a possibility that the container will still be in the {{liveContainers}} list for a very brief period after the container has finished. With the cached values the way I had them before, this gap was noticeable in the resource calculations. Your suggested changes brought the race back down even for the cached values. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an
[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2069: Attachment: YARN-2069-trunk-7.patch Updated patch Thanks, Mayank CS queue level preemption should respect user-limits Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch This is different from (even if related to, and likely share code with) YARN-2113. YARN-2113 focuses on making sure that even if queue has its guaranteed capacity, it's individual users are treated in-line with their limits irrespective of when they join in. This JIRA is about respecting user-limits while preempting containers to balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped
[ https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2328: --- Attachment: yarn-2328-preview.patch Here is a preview patch that introduces FSBackgroundTasksRunner. As of now, only UpdateThread and ContinuousSchedulingThread are added here. We might want to move the AllocationFileLoaderService also here. Appreciate any initial feedback on the approach or the patch. FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped Key: YARN-2328 URL: https://issues.apache.org/jira/browse/YARN-2328 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch, yarn-2328-preview.patch FairScheduler threads can use a little cleanup and tests. To begin with, the update and continuous-scheduling threads should extend Thread and handle being interrupted. We should have tests for starting and stopping them as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2211: Attachment: YARN-2211.7.patch RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073791#comment-14073791 ] Xuan Gong commented on YARN-2211: - new patch addressed all latest comments RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073874#comment-14073874 ] Hadoop QA commented on YARN-2211: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657706/YARN-2211.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4419//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4419//console This message is automatically generated. RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2211: Attachment: YARN-2211.7.1.patch RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.1.patch, YARN-2211.7.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private
[ https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073896#comment-14073896 ] Hadoop QA commented on YARN-2335: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657215/YARN-2335-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4420//console This message is automatically generated. Annotate all hadoop-sls APIs as @Private Key: YARN-2335 URL: https://issues.apache.org/jira/browse/YARN-2335 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2335-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073899#comment-14073899 ] Wangda Tan commented on YARN-2308: -- [~lichangleo], thanks for working on it! Looking forward your patch. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2209: -- Attachment: YARN-2209.2.patch Patch rebased on trunk Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch, YARN-2209.2.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073918#comment-14073918 ] Hadoop QA commented on YARN-2209: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657737/YARN-2209.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4422//console This message is automatically generated. Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch, YARN-2209.2.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073921#comment-14073921 ] Karthik Kambatla commented on YARN-2214: Patch looks mostly good. One nit: can we move the check in FSLeafQueue#preemptContainer before the debug logging. preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness --- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2214-v1.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2209: -- Attachment: YARN-2209.3.patch Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073940#comment-14073940 ] Hadoop QA commented on YARN-2211: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657728/YARN-2211.7.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4421//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4421//console This message is automatically generated. RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.1.patch, YARN-2211.7.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2338) service assemble so complex
[ https://issues.apache.org/jira/browse/YARN-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingjiaqi updated YARN-2338: Priority: Critical (was: Major) service assemble so complex --- Key: YARN-2338 URL: https://issues.apache.org/jira/browse/YARN-2338 Project: Hadoop YARN Issue Type: Wish Reporter: tangjunjie Priority: Critical See ResourceManager protected void serviceInit(Configuration configuration) throws Exception So many service will assembe into resourcemanager. Use guice or other service assemble framework to refactor this complex code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-2336: -- Attachment: YARN-2336-3.patch Thanks for review, [~ajisakaa]. Updated for comments. - Removed unused variables(leaf1/leaf2) - Adjusted under 80 characters - Removed unused import from FairSchedulerQueueInfo.java Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073967#comment-14073967 ] Hadoop QA commented on YARN-1994: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657741/YARN-1994.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4423//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4423//console This message is automatically generated. Expose YARN/MR endpoints on multiple interfaces --- Key: YARN-1994 URL: https://issues.apache.org/jira/browse/YARN-1994 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Craig Welch Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch YARN and MapReduce daemons currently do not support specifying a wildcard address for the server endpoints. This prevents the endpoints from being accessible from all interfaces on a multihomed machine. Note that if we do specify INADDR_ANY for any of the options, it will break clients as they will attempt to connect to 0.0.0.0. We need a solution that allows specifying a hostname or IP-address for clients while requesting wildcard bind for the servers. (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2349) InvalidStateTransitonException after RM switch
[ https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2349: Assignee: Rohith InvalidStateTransitonException after RM switch -- Key: YARN-2349 URL: https://issues.apache.org/jira/browse/YARN-2349 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty Assignee: Rohith {code} 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45018: starting 2014-07-23 19:22:28,266 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_REJECTED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped SelectChannelConnector@10.18.40.84:45020 2014-07-23 19:22:28,291 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Error when openning history file of application application_1406116264351_0007 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception
[ https://issues.apache.org/jira/browse/YARN-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073973#comment-14073973 ] Rohith commented on YARN-2350: -- This issue is because of YARN-2208 check in. As a wholse solution for the task AMRToken roll over, YARN-2211 also to be checked in which is in progress. I verified by applying patch YARN-2211, RM HA woks fine to me. TestApplicationMasterServiceOnHA fails with InvalidToken exception -- Key: YARN-2350 URL: https://issues.apache.org/jira/browse/YARN-2350 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Yarn-trunk/622 : {code} Running org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.591 sec FAILURE! - in org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA testAllocateOnHA(org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA) Time elapsed: 8.408 sec ERROR! org.apache.hadoop.security.token.SecretManager$InvalidToken: Given AMRMToken for application : appattempt_1000_0001_00 seems to have been generated illegally. at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy85.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy86.allocate(Unknown Source) at org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA.testAllocateOnHA(TestApplicationMasterServiceOnHA.java:84) {code} This is reproducible locally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073981#comment-14073981 ] Zhijie Shen commented on YARN-2247: --- +1 except some nits: 1. I meant RM has the same problem, and we need to do null check {code} +if (testMiniKDC != null) { + testMiniKDC.stop(); +} +rm.stop(); {code} 2. YarnAuthenticationFilter(Initializer) - RMAuthenticationFilter(Initializer) Allow RM web services users to authenticate using delegation tokens --- Key: YARN-2247 URL: https://issues.apache.org/jira/browse/YARN-2247 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch The RM webapp should allow users to authenticate using delegation tokens to maintain parity with RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073999#comment-14073999 ] Hadoop QA commented on YARN-2209: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657746/YARN-2209.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4424//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4424//console This message is automatically generated. Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
Zhijie Shen created YARN-2355: - Summary: MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container Key: YARN-2355 URL: https://issues.apache.org/jira/browse/YARN-2355 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074010#comment-14074010 ] Junping Du commented on YARN-1336: -- Good work, [~jlowe]! Thanks for sharing. Work-preserving nodemanager restart --- Key: YARN-1336 URL: https://issues.apache.org/jira/browse/YARN-1336 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, YARN-1336-rollup.patch This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074018#comment-14074018 ] Tsuyoshi OZAWA commented on YARN-2313: -- Great. I'll check it. Livelock can occur in FairScheduler when there are lots of running apps --- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, YARN-2313.4.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074033#comment-14074033 ] Hadoop QA commented on YARN-2336: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657750/YARN-2336-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4425//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4425//console This message is automatically generated. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2214: - Attachment: YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness --- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074039#comment-14074039 ] Ashwin Shankar commented on YARN-2214: -- Thanks [~kasha] ! Patch refreshed. preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness --- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2247) Allow RM web services users to authenticate using delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2247: Attachment: apache-yarn-2247.5.patch {quote} 1. I meant RM has the same problem, and we need to do null check {noformat} +if (testMiniKDC != null) { + testMiniKDC.stop(); +} +rm.stop(); {noformat} {quote} Got it. Fixed. {quote} 2. YarnAuthenticationFilter(Initializer) - RMAuthenticationFilter(Initializer) {quote} Fixed. Allow RM web services users to authenticate using delegation tokens --- Key: YARN-2247 URL: https://issues.apache.org/jira/browse/YARN-2247 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch, apache-yarn-2247.5.patch The RM webapp should allow users to authenticate using delegation tokens to maintain parity with RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074063#comment-14074063 ] Jian He commented on YARN-2229: --- bq. ContainerTokenIdentifier serializes a long (getContainerId()) at RM side, but deserializes a int (getId()) at NM side. In this case, I'm afraid it's going to be wrong. ContainerToken compatibility can not be ensured until rolling upgrades is completed as mentioned here https://issues.apache.org/jira/browse/YARN-2152?focusedCommentId=14061366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061366 Earlier the id integer returned by getId is supposed to be a monotonically increasing integer and some application logic may depend on that (e.g. sort containers based on the id integer), the problem with the approach of adding a new field is that multiple containers may have the same id integer after RM restarts. Thoughts? ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074073#comment-14074073 ] Hadoop QA commented on YARN-2214: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657768/YARN-2214-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4426//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4426//console This message is automatically generated. preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness --- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074082#comment-14074082 ] Hadoop QA commented on YARN-2247: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657770/apache-yarn-2247.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4427//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4427//console This message is automatically generated. Allow RM web services users to authenticate using delegation tokens --- Key: YARN-2247 URL: https://issues.apache.org/jira/browse/YARN-2247 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch, apache-yarn-2247.5.patch The RM webapp should allow users to authenticate using delegation tokens to maintain parity with RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074094#comment-14074094 ] Wangda Tan commented on YARN-415: - Hi Eric, Thanks for updating your patch again, *To your comments,* bq. I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps(). I would like to suggest to fix such UTs instead of inserting some kernel code to make UT pass. I'm not sure about the effort of doing this, if the effort is still reasonable, we should do it. bq. I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it No problem :), I can give some reviews about your existing changes. *I've reviewed some details of your patch, a very minor comments,* ApplicationCLI.java {code} + appReportStr.print(\tResources used : ); {code} We need change it to Resource Utilization as well? I think other the patch almost LGTM, looking forward your new patch contains an integration test. Thanks, Wangda Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2298) Move TimelineClient to yarn-common
[ https://issues.apache.org/jira/browse/YARN-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074104#comment-14074104 ] Jian He commented on YARN-2298: --- Maybe it's better to add dependency on yarn-client module instead of moving the code, as the client libraries are splitted into separate packages. similar problem happens if RM wants to use NMClient to launch AM. Move TimelineClient to yarn-common -- Key: YARN-2298 URL: https://issues.apache.org/jira/browse/YARN-2298 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2298.1.patch To allow RM to reuse the timeline client code, we have to move it out of yarn-client module, due to maven dependency issues. -- This message was sent by Atlassian JIRA (v6.2#6252)