[jira] [Commented] (YARN-7265) Hadoop Server Log Correlation
[ https://issues.apache.org/jira/browse/YARN-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357531#comment-16357531 ] Arun C Murthy commented on YARN-7265: - [~tanping] - Like [~jlowe] suggested, separating this from YARN makes sense. For e.g. Ambari has "log search", leveraging that would be a better option. Makes sense? Thanks. > Hadoop Server Log Correlation > --- > > Key: YARN-7265 > URL: https://issues.apache.org/jira/browse/YARN-7265 > Project: Hadoop YARN > Issue Type: Wish > Components: log-aggregation >Reporter: Tanping Wang >Priority: Major > > Hadoop has many server logs, yarn tasks logs, node manager logs, HDFS logs.. > There are also a lot of different ways can be used to expose the logs, build > relationship horizontally to correlate the logs or search the logs by > keyword. There is a need for a default and yet convenient logging analytics > mechanism in Hadoop itself that at least covers all the server logs of > Hadoop. This log analytics system can correlate the Hadoop server logs by > grouping them by various dimensions including application ID, task ID, job > ID or node ID etc. The raw logs with correlation can be easily accessed by > the application developer or cluster administrator via web page for managing > and debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reopened YARN-1126: - I'm re-opening this to commit the addendum patch from YARN-905 (https://issues.apache.org/jira/secure/attachment/12606009/YARN-905-addendum.patch) since the other jira already went out in 2.3.0. Targeting this for 2.7.0. Add validation of users input nodes-states options to nodes CLI --- Key: YARN-1126 URL: https://issues.apache.org/jira/browse/YARN-1126 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Follow the discussion in YARN-905. (1) case-insensitive checks for all. (2) validation of users input, exit with non-zero code and print all valid states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-893) Capacity scheduler allocates vcores to containers but does not report it in headroom
[ https://issues.apache.org/jira/browse/YARN-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308705#comment-14308705 ] Arun C Murthy commented on YARN-893: [~kj-ki] [~ozawa] - the {{DefaultResourceCalculator}} is meant to not use vcores by design - if vcores is desired, one should use the {{DominantResourceCalculator}}. Makes sense? Capacity scheduler allocates vcores to containers but does not report it in headroom Key: YARN-893 URL: https://issues.apache.org/jira/browse/YARN-893 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta, 2.3.0 Reporter: Bikas Saha Assignee: Kenji Kikushima Attachments: YARN-893-2.patch, YARN-893.patch In non-DRF mode, it reports 0 vcores in the headroom but it allocates 1 vcore to containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) Protocol changes and implementations in NM side to support change container resource
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308786#comment-14308786 ] Arun C Murthy commented on YARN-1449: - Cleaning up stale PA patches. Protocol changes and implementations in NM side to support change container resource Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.2.0 Reporter: Wangda Tan (No longer used) Assignee: Wangda Tan (No longer used) Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch As described in YARN-1197, we need add API/implementation changes, 1) Add a changeContainersResources method in ContainerManagementProtocol 2) Can get succeed/failed increased/decreased containers in response of changeContainersResources 3) Add a new decreased containers field in NodeStatus which can help NM notify RM such changes 4) Added changeContainersResources implementation in ContainerManagerImpl 5) Added changes in ContainersMonitorImpl to support change resource limit of containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2731) RegisterApplicationMasterResponsePBImpl: not properly initialized builder
[ https://issues.apache.org/jira/browse/YARN-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2731: Fix Version/s: (was: 2.6.0) 2.7.0 RegisterApplicationMasterResponsePBImpl: not properly initialized builder - Key: YARN-2731 URL: https://issues.apache.org/jira/browse/YARN-2731 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.7.0 Attachments: YARN-2731.patch If I am not mistaken in RegisterApplicationMasterResponsePBImpl we are missing to initialize the builder in setNMTokensFromPreviousAttempts(), and we initialize the builder in the wrong place in: setClientToAMTokenMasterKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1142: Fix Version/s: (was: 2.6.0) 2.7.0 MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.7.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1514: Fix Version/s: (was: 2.6.0) 2.7.0 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1723: Fix Version/s: (was: 2.6.0) 2.7.0 AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Bikas Saha Fix For: 2.7.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2280: Fix Version/s: (was: 2.6.0) 2.7.0 Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Trivial Fix For: 2.7.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-314: --- Fix Version/s: (was: 2.6.0) 2.7.0 Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Karthik Kambatla Fix For: 2.7.0 Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1156: Fix Version/s: (was: 2.6.0) 2.7.0 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.7.0 Attachments: YARN-1156.1.patch, YARN-1156.2.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-745: --- Fix Version/s: (was: 2.6.0) 2.7.0 Move UnmanagedAMLauncher to yarn client package --- Key: YARN-745 URL: https://issues.apache.org/jira/browse/YARN-745 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.7.0 Its currently sitting in yarn applications project which sounds wrong. client project sounds better since it contains the utilities/libraries that clients use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1334: Fix Version/s: (was: 2.6.0) 2.7.0 YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-650: --- Fix Version/s: (was: 2.6.0) 2.7.0 User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.7.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2113: Fix Version/s: (was: 2.6.0) 2.7.0 Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2113 URL: https://issues.apache.org/jira/browse/YARN-2113 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.7.0 Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1621: Fix Version/s: (was: 2.6.0) 2.7.0 Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.7.0 Attachments: YARN-1621.1.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state
[ https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2483: Fix Version/s: (was: 2.6.0) 2.7.0 TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state Key: YARN-2483 URL: https://issues.apache.org/jira/browse/YARN-2483 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Fix For: 2.7.0 From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console : {code} testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) Time elapsed: 49.686 sec FAILURE! java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402) {code} TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with similar cause. These tests failed in build #664 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1234: Fix Version/s: (was: 2.6.0) 2.7.0 Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.7.0 When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol
[ https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-308: --- Fix Version/s: (was: 2.6.0) 2.7.0 Improve documentation about what asks means in AMRMProtocol - Key: YARN-308 URL: https://issues.apache.org/jira/browse/YARN-308 Project: Hadoop YARN Issue Type: Sub-task Components: api, documentation, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.7.0 Attachments: YARN-308.patch It's unclear to me from reading the javadoc exactly what asks means when the AM sends a heartbeat to the RM. Is the AM supposed to send a list of all resources that it is waiting for? Or just inform the RM about new ones that it wants? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1477) Improve AM web UI to avoid confusion about AM restart
[ https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1477: Fix Version/s: (was: 2.6.0) 2.7.0 Improve AM web UI to avoid confusion about AM restart - Key: YARN-1477 URL: https://issues.apache.org/jira/browse/YARN-1477 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chen He Assignee: Chen He Labels: features Fix For: 2.7.0 Improve AM web UI, Add submitTime field to the AM's web services REST API, improve Elapsed: row time computation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-965: --- Fix Version/s: (was: 2.6.0) 2.7.0 NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed -- Key: YARN-965 URL: https://issues.apache.org/jira/browse/YARN-965 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Environment: suse linux Reporter: Li Yuan Fix For: 2.7.0 When successfully launched a container, container state from LOCALIZED to RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or KILLING to DONE, containersRunning--. However, state EXITED_WITH_FAILURE or KILLING could come from LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less than the actual number. Further more, Metrics is wrong, containersLaunched != containersCompleted + containersFailed + containersKilled + containersRunning + containersIniting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-153: --- Fix Version/s: (was: 2.6.0) 2.7.0 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Fix For: 2.7.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1147: Fix Version/s: (was: 2.6.0) 2.7.0 Add end-to-end tests for HA --- Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.7.0 While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-160: --- Fix Version/s: (was: 2.6.0) 2.7.0 nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-113: --- Fix Version/s: (was: 2.6.0) 2.7.0 WebAppProxyServlet must use SSLFactory for the HttpClient connections - Key: YARN-113 URL: https://issues.apache.org/jira/browse/YARN-113 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.7.0 The HttpClient must be configured to use the SSLFactory when the web UIs are over HTTPS, otherwise the proxy servlet fails to connect to the AM because of unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220047#comment-14220047 ] Arun C Murthy commented on YARN-2139: - Sorry, been busy with 2.6.0 - just coming up for air. What are we modeling with vdisk again? What is the metric? Is it directly the blkio parameter? If so, that is my biggest concern. [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2635: Fix Version/s: (was: 2.7.0) 2.6.0 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS -- Key: YARN-2635 URL: https://issues.apache.org/jira/browse/YARN-2635 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.6.0 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, yarn-2635-4.patch If we change the scheduler from Capacity Scheduler to Fair Scheduler, the TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering
[ https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210143#comment-14210143 ] Arun C Murthy commented on YARN-2853: - I've merged this back into branch-2.6 for hadoop-2.6.0-rc1. Killing app may hang while AM is unregistering -- Key: YARN-2853 URL: https://issues.apache.org/jira/browse/YARN-2853 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, YARN-2853.3.patch When killing an app, app first moves to KILLING state, If RMAppAttempt receives the attempt_unregister event before attempt_kill event, it'll ignore the later attempt_kill event. Hence, RMApp won't be able to move to KILLED state and stays at KILLING state forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering
[ https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2853: Fix Version/s: (was: 2.7.0) 2.6.0 Killing app may hang while AM is unregistering -- Key: YARN-2853 URL: https://issues.apache.org/jira/browse/YARN-2853 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, YARN-2853.3.patch When killing an app, app first moves to KILLING state, If RMAppAttempt receives the attempt_unregister event before attempt_kill event, it'll ignore the later attempt_kill event. Hence, RMApp won't be able to move to KILLED state and stays at KILLING state forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210142#comment-14210142 ] Arun C Murthy commented on YARN-2635: - I've merged this back into branch-2.6 since it is safe, and is causing conflicts with too many cherry-picks. TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS -- Key: YARN-2635 URL: https://issues.apache.org/jira/browse/YARN-2635 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.6.0 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, yarn-2635-4.patch If we change the scheduler from Capacity Scheduler to Fair Scheduler, the TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels
[ https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2843: Target Version/s: 2.6.0 (was: 2.7.0) NodeLabels manager should trim all inputs for hosts and labels -- Key: YARN-2843 URL: https://issues.apache.org/jira/browse/YARN-2843 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sushmitha Sreenivasan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2843-1.patch, YARN-2843-2.patch NodeLabels manager should trim all inputs for hosts and labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2841) RMProxy should retry EOFException
[ https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208441#comment-14208441 ] Arun C Murthy commented on YARN-2841: - Merged this into branch-2.6 for hadoop-2.6.0-rc1. RMProxy should retry EOFException -- Key: YARN-2841 URL: https://issues.apache.org/jira/browse/YARN-2841 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jian He Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2841.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels
[ https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2843: Fix Version/s: (was: 2.7.0) 2.6.0 NodeLabels manager should trim all inputs for hosts and labels -- Key: YARN-2843 URL: https://issues.apache.org/jira/browse/YARN-2843 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sushmitha Sreenivasan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2843-1.patch, YARN-2843-2.patch NodeLabels manager should trim all inputs for hosts and labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels
[ https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208440#comment-14208440 ] Arun C Murthy commented on YARN-2843: - Merged this into branch-2.6 for hadoop-2.6.0-rc1. NodeLabels manager should trim all inputs for hosts and labels -- Key: YARN-2843 URL: https://issues.apache.org/jira/browse/YARN-2843 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sushmitha Sreenivasan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2843-1.patch, YARN-2843-2.patch NodeLabels manager should trim all inputs for hosts and labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1964: Assignee: Abin Shahab (was: Ravi Prakash) Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Fix For: 2.6.0 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208615#comment-14208615 ] Arun C Murthy commented on YARN-1964: - [~raviprak] I'm concerned this is coming in VERY late into 2.6... we've been in closedown mode for a while. The only mitigating factor is that this is fairly isolated since it's a new {{ContainerExecutor}}, and we can label it as an *alpha* feature. Any other objections to pulling this into 2.6? Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Fix For: 2.6.0 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14204002#comment-14204002 ] Arun C Murthy commented on YARN-2830: - Is this ready to go? Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, YARN-2830-v3.patch, YARN-2830-v4.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2834) Resource manager crashed with Null Pointer Exception
[ https://issues.apache.org/jira/browse/YARN-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2834: Priority: Blocker (was: Critical) Resource manager crashed with Null Pointer Exception Key: YARN-2834 URL: https://issues.apache.org/jira/browse/YARN-2834 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Jian He Priority: Blocker Attachments: YARN-2834.1.patch Resource manager failed after restart. {noformat} 2014-11-09 04:12:53,013 INFO capacity.CapacityScheduler (CapacityScheduler.java:initializeQueues(467)) - Initialized root queue root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0 2014-11-09 04:12:53,013 INFO capacity.CapacityScheduler (CapacityScheduler.java:initializeQueueMappings(436)) - Initialized queue mappings, override: false 2014-11-09 04:12:53,013 INFO capacity.CapacityScheduler (CapacityScheduler.java:initScheduler(305)) - Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=memory:256, vCores:1, maximumAllocation=memory:2048, vCores:32, asynchronousScheduling=false, asyncScheduleInterval=5ms 2014-11-09 04:12:53,015 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in state STARTED; cause: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1041) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1005) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:821) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:843) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:826) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:701) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051) at
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198729#comment-14198729 ] Arun C Murthy commented on YARN-2579: - Can we please get this in today? Tx. Both RM's state is Active , but 1 RM is not really active. -- Key: YARN-2579 URL: https://issues.apache.org/jira/browse/YARN-2579 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: YARN-2579-20141105.1.patch, YARN-2579-20141105.patch, YARN-2579.patch, YARN-2579.patch I encountered a situaltion where both RM's web page was able to access and its state displayed as Active. But One of the RM's ActiveServices were stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2817) Disk as a resource in YARN
Arun C Murthy created YARN-2817: --- Summary: Disk as a resource in YARN Key: YARN-2817 URL: https://issues.apache.org/jira/browse/YARN-2817 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy As YARN continues to cover new ground in terms of new workloads, disk is becoming a very important resource to govern. It might be prudent to start with something very simple - allow applications to request entire drives (e.g. 2 drives out of the 12 available on a node), we can then also add support for specific iops, bandwidth etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2817) Disk as a resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199910#comment-14199910 ] Arun C Murthy commented on YARN-2817: - Kafka on YARN (KAFKA-1754) would benefit enormously if it could reserve a certain number of drives on a node exclusively. Disk as a resource in YARN -- Key: YARN-2817 URL: https://issues.apache.org/jira/browse/YARN-2817 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy As YARN continues to cover new ground in terms of new workloads, disk is becoming a very important resource to govern. It might be prudent to start with something very simple - allow applications to request entire drives (e.g. 2 drives out of the 12 available on a node), we can then also add support for specific iops, bandwidth etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2817) Disk drive as a resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199926#comment-14199926 ] Arun C Murthy commented on YARN-2817: - Sure, I'll link it to YARN-2139. Tx Disk drive as a resource in YARN Key: YARN-2817 URL: https://issues.apache.org/jira/browse/YARN-2817 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy As YARN continues to cover new ground in terms of new workloads, disk is becoming a very important resource to govern. It might be prudent to start with something very simple - allow applications to request entire drives (e.g. 2 drives out of the 12 available on a node), we can then also add support for specific iops, bandwidth etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2817) Disk drive as a resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2817: Summary: Disk drive as a resource in YARN (was: Disk as a resource in YARN) Disk drive as a resource in YARN Key: YARN-2817 URL: https://issues.apache.org/jira/browse/YARN-2817 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy As YARN continues to cover new ground in terms of new workloads, disk is becoming a very important resource to govern. It might be prudent to start with something very simple - allow applications to request entire drives (e.g. 2 drives out of the 12 available on a node), we can then also add support for specific iops, bandwidth etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2817) Disk drive as a resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199926#comment-14199926 ] Arun C Murthy edited comment on YARN-2817 at 11/6/14 7:08 AM: -- Sure, I'll link it to YARN-2139. We can use this jira to track supporting disk drive as a resource. Tx was (Author: acmurthy): Sure, I'll link it to YARN-2139. Tx Disk drive as a resource in YARN Key: YARN-2817 URL: https://issues.apache.org/jira/browse/YARN-2817 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy As YARN continues to cover new ground in terms of new workloads, disk is becoming a very important resource to govern. It might be prudent to start with something very simple - allow applications to request entire drives (e.g. 2 drives out of the 12 available on a node), we can then also add support for specific iops, bandwidth etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199957#comment-14199957 ] Arun C Murthy commented on YARN-2139: - [~ywskycn] - thanks for the design doc, it's well put together. Some feedback: # We shouldn't embed Linux or blkio specific semantics such as {{proportional weight division}} into YARN. We need something generic such as {{bandwidth}} which can be understood by users, supportable on heterogenous nodes in the same cluster and supportable on other platforms like Windows. # Spindle locality or I/O parallelism is a real concern - we should probably support {{bandwidth}} and {{spindles}}. # Spindle locality or I/O parallelism cannot be tied to HDFS. In fact, YARN should not have a dependency on HDFS at all (*smile*)! This is particularly important in light of developments like Kafka-on-YARN (KAFKA-1754) because people want to use YARN to deploy only Kafka Storm etc. YARN-2817 helps in this regard. Makes sense? Add support for disk IO isolation/scheduling for containers --- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2481) YARN should allow defining the location of java
[ https://issues.apache.org/jira/browse/YARN-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150049#comment-14150049 ] Arun C Murthy commented on YARN-2481: - [~ashahab] YARN already allows the JAVA_HOME to be overridable... take a look at {{ApplicationConstants.Environment.JAVA_HOME}} and {{YarnConfiguration.DEFAULT_NM_ENV_WHITELIST}} for the code-path. YARN should allow defining the location of java --- Key: YARN-2481 URL: https://issues.apache.org/jira/browse/YARN-2481 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Yarn right now uses the location of the JAVA_HOME on the host to launch containers. This does not work with Docker containers which have their own filesystem namespace and OS. If the location of the Java binary of the container to be launched is configurable, yarn can launch containers that have java in a different location than the host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095825#comment-14095825 ] Arun C Murthy commented on YARN-2411: - Looks good to me. Hopefully this should be a simple enhancement in CS. [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings.enable A boolean that controls if queue mappings are enabled, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format: map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2411: Assignee: Ram Venkatesh [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh Assignee: Ram Venkatesh YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings.enable A boolean that controls if queue mappings are enabled, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format: map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned YARN-1488: --- Assignee: Arun C Murthy Allow containers to delegate resources to another container --- Key: YARN-1488 URL: https://issues.apache.org/jira/browse/YARN-1488 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy We should allow containers to delegate resources to another container. This would allow external frameworks to share not just YARN's resource-management capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088309#comment-14088309 ] Arun C Murthy commented on YARN-1488: - I have an early patch I'll share shortly, this feature ask is coming up in a lot of places and has generated lots of interest. Allow containers to delegate resources to another container --- Key: YARN-1488 URL: https://issues.apache.org/jira/browse/YARN-1488 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy We should allow containers to delegate resources to another container. This would allow external frameworks to share not just YARN's resource-management capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1963: Assignee: Sunil G (was: Arun C Murthy) Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986163#comment-13986163 ] Arun C Murthy commented on YARN-1963: - [~sunilg] thanks for taking this up! As [~vinodkv] mentioned; a short writeup will help - look forward to helping get this in; thanks again! Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978901#comment-13978901 ] Arun C Murthy commented on YARN-1696: - [~kasha] do you think we can get this in for 2.4.1? Tx. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-1696.2.patch, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1964: Assignee: (was: Arun C Murthy) Support Docker containers in YARN - Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1964: Assignee: Abin Shahab Support Docker containers in YARN - Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Abin Shahab Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977489#comment-13977489 ] Arun C Murthy commented on YARN-1964: - [~ashahab] - That's great to hear, awesome! Thanks for taking this up! Support Docker containers in YARN - Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Abin Shahab Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1963) Support priorities across applications within the same queue
Arun C Murthy created YARN-1963: --- Summary: Support priorities across applications within the same queue Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Arun C Murthy It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1964) Support Docker containers in YARN
Arun C Murthy created YARN-1964: --- Summary: Support Docker containers in YARN Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1932) Javascript injection on the job status page
[ https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1932: Priority: Blocker (was: Critical) Javascript injection on the job status page --- Key: YARN-1932 URL: https://issues.apache.org/jira/browse/YARN-1932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.9, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Priority: Blocker Attachments: YARN-1932.patch Scripts can be injected into the job status page as the diagnostics field is not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page. We need escaping the diagnostic string in order to not run the scripts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1701: Priority: Blocker (was: Major) Improve default paths of timeline store and generic history store - Key: YARN-1701 URL: https://issues.apache.org/jira/browse/YARN-1701 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch When I enable AHS via yarn.ahs.enabled, the app history is still not visible in AHS webUI. This is due to NullApplicationHistoryStore as yarn.resourcemanager.history-writer.class. It would be good to have just one key to enable basic functionality. yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is local file system location. However, FileSystemApplicationHistoryStore uses DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1701: Affects Version/s: (was: 2.4.1) 2.4.0 Improve default paths of timeline store and generic history store - Key: YARN-1701 URL: https://issues.apache.org/jira/browse/YARN-1701 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch When I enable AHS via yarn.ahs.enabled, the app history is still not visible in AHS webUI. This is due to NullApplicationHistoryStore as yarn.resourcemanager.history-writer.class. It would be good to have just one key to enable basic functionality. yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is local file system location. However, FileSystemApplicationHistoryStore uses DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1701: Target Version/s: 2.4.1 Improve default paths of timeline store and generic history store - Key: YARN-1701 URL: https://issues.apache.org/jira/browse/YARN-1701 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: YARN-1701.v01.patch, YARN-1701.v02.patch When I enable AHS via yarn.ahs.enabled, the app history is still not visible in AHS webUI. This is due to NullApplicationHistoryStore as yarn.resourcemanager.history-writer.class. It would be good to have just one key to enable basic functionality. yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is local file system location. However, FileSystemApplicationHistoryStore uses DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1935) Security for ATS
Arun C Murthy created YARN-1935: --- Summary: Security for ATS Key: YARN-1935 URL: https://issues.apache.org/jira/browse/YARN-1935 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Jira to track work to secure the ATS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-322) Add cpu information to queue metrics
[ https://issues.apache.org/jira/browse/YARN-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-322: --- Fix Version/s: (was: 2.4.0) 2.5.0 Add cpu information to queue metrics Key: YARN-322 URL: https://issues.apache.org/jira/browse/YARN-322 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.5.0 Post YARN-2 we need to add cpu information to queue metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1334: Fix Version/s: (was: 2.4.0) 2.5.0 YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.5.0 Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-650: --- Fix Version/s: (was: 2.4.0) 2.5.0 User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.5.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1514: Fix Version/s: (was: 2.4.0) 2.5.0 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.5.0 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1722) AMRMProtocol should have a way of getting all the nodes in the cluster
[ https://issues.apache.org/jira/browse/YARN-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1722: Fix Version/s: (was: 2.4.0) 2.5.0 AMRMProtocol should have a way of getting all the nodes in the cluster -- Key: YARN-1722 URL: https://issues.apache.org/jira/browse/YARN-1722 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Bikas Saha Fix For: 2.5.0 There is no way for an AM to find out the names of all the nodes in the cluster via the AMRMProtocol. An AM can only at best ask for containers at * location. The only way to get that information is via the ClientRMProtocol but that is secured by Kerberos or RMDelegationToken while the AM has an AMRMToken. This is a pretty important piece of missing functionality. There are other jiras opened about getting cluster topology etc. but they havent been addressed due to a clear definition of cluster topology perhaps. Adding a means to at least get the node information would be a good first step. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-153: --- Fix Version/s: (was: 2.4.0) 2.5.0 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Fix For: 2.5.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1234: Fix Version/s: (was: 2.4.0) 2.5.0 Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.5.0 When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-314: --- Fix Version/s: (was: 2.4.0) 2.5.0 Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-113: --- Fix Version/s: (was: 2.4.0) 2.5.0 WebAppProxyServlet must use SSLFactory for the HttpClient connections - Key: YARN-113 URL: https://issues.apache.org/jira/browse/YARN-113 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.5.0 The HttpClient must be configured to use the SSLFactory when the web UIs are over HTTPS, otherwise the proxy servlet fails to connect to the AM because of unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1723: Fix Version/s: (was: 2.4.0) 2.5.0 AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Bikas Saha Fix For: 2.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1477) No Submit time on AM web pages
[ https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1477: Fix Version/s: (was: 2.4.0) 2.5.0 No Submit time on AM web pages -- Key: YARN-1477 URL: https://issues.apache.org/jira/browse/YARN-1477 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chen He Assignee: Chen He Labels: features Fix For: 2.5.0 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to the AM's web services REST API -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1147: Fix Version/s: (was: 2.4.0) 2.5.0 Add end-to-end tests for HA --- Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.5.0 While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1156: Fix Version/s: (was: 2.4.0) 2.5.0 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.5.0 Attachments: YARN-1156.1.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-965: --- Fix Version/s: (was: 2.4.0) 2.5.0 NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed -- Key: YARN-965 URL: https://issues.apache.org/jira/browse/YARN-965 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Environment: suse linux Reporter: Li Yuan Fix For: 2.5.0 When successfully launched a container, container state from LOCALIZED to RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or KILLING to DONE, containersRunning--. However, state EXITED_WITH_FAILURE or KILLING could come from LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less than the actual number. Further more, Metrics is wrong, containersLaunched != containersCompleted + containersFailed + containersKilled + containersRunning + containersIniting -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1142: Fix Version/s: (was: 2.4.0) 2.5.0 MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.5.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol
[ https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-308: --- Fix Version/s: (was: 2.4.0) 2.5.0 Improve documentation about what asks means in AMRMProtocol - Key: YARN-308 URL: https://issues.apache.org/jira/browse/YARN-308 Project: Hadoop YARN Issue Type: Sub-task Components: api, documentation, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-308.patch It's unclear to me from reading the javadoc exactly what asks means when the AM sends a heartbeat to the RM. Is the AM supposed to send a list of all resources that it is waiting for? Or just inform the RM about new ones that it wants? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-614: --- Fix Version/s: (was: 2.4.0) 2.5.0 Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1 -- Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Chris Riccomini Fix For: 2.5.0 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1621: Fix Version/s: (was: 2.4.0) 2.5.0 Add CLI to list states of yarn container-IDs/hosts -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.5.0 As more applications are moved to YARN, we need generic CLI to list states of yarn containers and their hosts. Today if YARN application running in a container does hang, there is no way other than to manually kill its process. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers appId status where status is one of running/succeeded/killed/failed/all {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9
[ https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1327: Fix Version/s: (was: 2.4.0) 2.5.0 Fix nodemgr native compilation problems on FreeBSD9 --- Key: YARN-1327 URL: https://issues.apache.org/jira/browse/YARN-1327 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Fix For: 3.0.0, 2.5.0 Attachments: nodemgr-portability.txt There are several portability problems preventing from compiling native component on freebsd. 1. libgen.h is not included. correct function prototype is there but linux glibc has workaround to define it for user if libgen.h is not directly included. Include this file directly. 2. query max size of login name using sysconf. it follows same code style like rest of code using sysconf too. 3. cgroups are linux only feature, make conditional compile and return error if mount_cgroup is attempted on non linux OS 4. do not use posix function setpgrp() since it clashes with same function from BSD 4.2, use equivalent function. After inspecting glibc sources its just shortcut to setpgid(0,0) These changes makes it compile on both linux and freebsd. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-160: --- Fix Version/s: (was: 2.4.0) 2.5.0 nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Fix For: 2.5.0 As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-745: --- Fix Version/s: (was: 2.4.0) 2.5.0 Move UnmanagedAMLauncher to yarn client package --- Key: YARN-745 URL: https://issues.apache.org/jira/browse/YARN-745 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.5.0 Its currently sitting in yarn applications project which sounds wrong. client project sounds better since it contains the utilities/libraries that clients use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965515#comment-13965515 ] Arun C Murthy commented on YARN-1769: - Sorry guys, been slammed. I'll take a look at this presently. Tx. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960929#comment-13960929 ] Arun C Murthy commented on YARN-1878: - [~xgong] is this ready to go? Let's get this into 2.4.1. Tx. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1878: Target Version/s: 2.4.1 Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1878: Priority: Blocker (was: Major) Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1696: Target Version/s: 2.4.1 (was: 2.4.0) Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-1696.2.patch, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955015#comment-13955015 ] Arun C Murthy commented on YARN-1696: - [~kasha] - I'm almost done with rc0, moving this to 2.4.1 - if we need to spin rc1 we can get this in. Else, we can manually put this doc on the site when ready for 2.4.0. Thanks. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-1696.2.patch, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954411#comment-13954411 ] Arun C Murthy commented on YARN-1696: - [~kasha] - You think you can update the doc w/ the feedback quick-ish? Thanks. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-1696.2.patch, yarn-1696-1.patch Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943822#comment-13943822 ] Arun C Murthy commented on YARN-1696: - Thanks [~kkambatl]. In the worst case we can put your existing docs on jira if we can't get it in early next week and this is the only one blocking 2.4. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943822#comment-13943822 ] Arun C Murthy edited comment on YARN-1696 at 3/22/14 1:15 AM: -- Thanks [~kkambatl]. In the worst case we can put your existing docs on the wiki if we can't get it in early next week and this is the only one blocking 2.4. was (Author: acmurthy): Thanks [~kkambatl]. In the worst case we can put your existing docs on jira if we can't get it in early next week and this is the only one blocking 2.4. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942098#comment-13942098 ] Arun C Murthy commented on YARN-1696: - [~kkambatl] Any update on this? Thanks. Document RM HA -- Key: YARN-1696 URL: https://issues.apache.org/jira/browse/YARN-1696 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942106#comment-13942106 ] Arun C Murthy commented on YARN-1051: - Thanks [~subru], I'll take a look at the update. One thing I've mentioned to [~curino] offline is that I think we are better of relying on enhancing/reducing *priorities* for applications to effect reservations rather than relying on adding/removing queues. Priorities within the same queue is an often requested feature anyway - that way we can solve multiple goals (operational-feature/reservations) with the same underlying mechanism i.e. priorities. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942119#comment-13942119 ] Arun C Murthy commented on YARN-1707: - I'm very supportive of features like removing queues (adding is already supported). However, as I just commented on YARN-1051, I think we are better of relying on enhancing/reducing priorities rather than adding/removing queues. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942122#comment-13942122 ] Arun C Murthy commented on YARN-1051: - More color on why I prefer priorities for reservations rather than adding/removing queues... In vast majority of deployments, queues are an organizational/economic concept (e.g. per-department queues are very common) and are queues (hierarchy, names etc.) are quite stable and well recognized and part of the institutional memory. If we rely on adding/removing queues to provide reservations, I'm concerned it will cause some confusion among both admins and users. For e.g. a user/admin trying to debug his application will be quite challenged to figure demand/supply of resources when he has to go back in time to reconstruct a programmatically generated queue hierarchy, particularly after it's long gone. Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); and more importantly is a natural fit to the problem at hand i.e. temporally increase/decrease the priority of the application based on it's reservation at a point in time. Furthermore, as I said previously, priorities are an often requested feature - especially by admins. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942122#comment-13942122 ] Arun C Murthy edited comment on YARN-1051 at 3/20/14 6:50 PM: -- More color on why I prefer priorities for reservations rather than adding/removing queues... In vast majority of deployments, queues are an organizational/economic concept (e.g. per-department queues are very common) and are queues (hierarchy, names etc.) are quite stable and well recognized to point of being part of the institutional memory. If we rely on adding/removing queues to provide reservations, I'm concerned it will cause some confusion among both admins and users. For e.g. a user/admin trying to debug his application will be quite challenged to figure demand/supply of resources when he has to go back in time to reconstruct a programmatically generated queue hierarchy, particularly after it's long gone. Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); and more importantly is a natural fit to the problem at hand i.e. temporally increase/decrease the priority of the application based on it's reservation at a point in time. Furthermore, as I said previously, priorities are an often requested feature - especially by admins. was (Author: acmurthy): More color on why I prefer priorities for reservations rather than adding/removing queues... In vast majority of deployments, queues are an organizational/economic concept (e.g. per-department queues are very common) and are queues (hierarchy, names etc.) are quite stable and well recognized and part of the institutional memory. If we rely on adding/removing queues to provide reservations, I'm concerned it will cause some confusion among both admins and users. For e.g. a user/admin trying to debug his application will be quite challenged to figure demand/supply of resources when he has to go back in time to reconstruct a programmatically generated queue hierarchy, particularly after it's long gone. Priorities, OTOH, is quite a familiar concept to admins (think unix 'nice'); and more importantly is a natural fit to the problem at hand i.e. temporally increase/decrease the priority of the application based on it's reservation at a point in time. Furthermore, as I said previously, priorities are an often requested feature - especially by admins. YARN Admission Control/Planner: enhancing the resource allocation model with time. -- Key: YARN-1051 URL: https://issues.apache.org/jira/browse/YARN-1051 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager, scheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1051-design.pdf, curino_MSR-TR-2013-108.pdf, techreport.pdf In this umbrella JIRA we propose to extend the YARN RM to handle time explicitly, allowing users to reserve capacity over time. This is an important step towards SLAs, long-running services, workflows, and helps for gang scheduling. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-796: --- Attachment: YARN-796.patch I had the _luxury_ of a long flight... *cough* Here is a very early WIP patch which illustrates an approach, there is a lot of W left in the WIP. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats
[ https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1512: Fix Version/s: 2.4.0 Enhance CS to decouple scheduling from node heartbeats -- Key: YARN-1512 URL: https://issues.apache.org/jira/browse/YARN-1512 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.4.0 Attachments: YARN-1512.2.patch, YARN-1512.patch, YARN-1512.patch, YARN-1512.patch, YARN-1512.patch Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly. -- This message was sent by Atlassian JIRA (v6.2#6252)