[jira] [Created] (YARN-6106) Add doc for tag 'allowPreemptionFrom' in Fair Scheduler
Yufei Gu created YARN-6106: -- Summary: Add doc for tag 'allowPreemptionFrom' in Fair Scheduler Key: YARN-6106 URL: https://issues.apache.org/jira/browse/YARN-6106 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Yufei Gu Assignee: Yufei Gu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6105) Support for new REST end point /clusterids
Rohith Sharma K S created YARN-6105: --- Summary: Support for new REST end point /clusterids Key: YARN-6105 URL: https://issues.apache.org/jira/browse/YARN-6105 Project: Hadoop YARN Issue Type: Sub-task Reporter: Rohith Sharma K S As discussed in YARN-5378 and YARN-6095, it is required to have */clusterids* that returns list of clusterids that back end has is useful. Use case : In cloud, clusters are arbitrarily spin up and destroyed. Each cluster has its own clusterId which UI never knows about it. To all those newly spin up cluster, same ATS server has been used. And sam web UI has been used. Admin can select the clusterId and navigate to any pages. So, it is worth to list ClusterId's from ATS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release cadence and EOL
+1 I would also like to see some process guidelines. I should have brought this up on the discussion thread, but I thought of them only now :( - Is an RM responsible for all maintenance releases against a minor release, or finding another RM to drive a maintenance release? In the past, this hasn't been a major issue. - When do we pick/volunteer to RM a minor release? IMO, this should be right after the previous release goes out. For example, Junping is driving 2.8.0 now. As soon as that is done, we need to find a volunteer to RM 2.9.0 6 months after. - The release process has multiple steps, based on major/minor/maintenance. It would be nice to capture/track how long each step takes so the RM can be prepared. e.g. herding the cats for an RC takes x weeks, compatibility checks take y days of work. On Tue, Jan 17, 2017 at 10:05 AM, Sangjin Leewrote: > Thanks for correcting me! I left out a sentence by mistake. :) > > To correct the proposal we're voting for: > > A minor release on the latest major line should be every 6 months, and a > maintenance release on a minor release (as there may be concurrently > maintained minor releases) every 2 months. > > A minor release line is end-of-lifed 2 years after it is released or there > are 2 newer minor releases, whichever is sooner. The community reserves the > right to extend or shorten the life of a release line if there is a good > reason to do so. > > Sorry for the snafu. > > Regards, > Sangjin > > On Tue, Jan 17, 2017 at 9:58 AM, Daniel Templeton > wrote: > > > Thanks for driving this, Sangjin. Quick question, though: the subject > line > > is "Release cadence and EOL," but I don't see anything about cadence in > the > > proposal. Did I miss something? > > > > Daniel > > > > > > On 1/17/17 8:35 AM, Sangjin Lee wrote: > > > >> Following up on the discussion thread on this topic ( > >> https://s.apache.org/eFOf), I'd like to put the proposal for a vote for > >> the > >> release cadence and EOL. The proposal is as follows: > >> > >> "A minor release line is end-of-lifed 2 years after it is released or > >> there > >> are 2 newer minor releases, whichever is sooner. The community reserves > >> the > >> right to extend or shorten the life of a release line if there is a good > >> reason to do so." > >> > >> This also entails that we the Hadoop community commit to following this > >> practice and solving challenges to make it possible. Andrew Wang laid > out > >> some of those challenges and what can be done in the discussion thread > >> mentioned above. > >> > >> I'll set the voting period to 7 days. I understand a majority rule would > >> apply in this case. Your vote is greatly appreciated, and so are > >> suggestions! > >> > >> Thanks, > >> Sangjin > >> > >> > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > >
Re: [VOTE] Release cadence and EOL
Thanks for correcting me! I left out a sentence by mistake. :) To correct the proposal we're voting for: A minor release on the latest major line should be every 6 months, and a maintenance release on a minor release (as there may be concurrently maintained minor releases) every 2 months. A minor release line is end-of-lifed 2 years after it is released or there are 2 newer minor releases, whichever is sooner. The community reserves the right to extend or shorten the life of a release line if there is a good reason to do so. Sorry for the snafu. Regards, Sangjin On Tue, Jan 17, 2017 at 9:58 AM, Daniel Templetonwrote: > Thanks for driving this, Sangjin. Quick question, though: the subject line > is "Release cadence and EOL," but I don't see anything about cadence in the > proposal. Did I miss something? > > Daniel > > > On 1/17/17 8:35 AM, Sangjin Lee wrote: > >> Following up on the discussion thread on this topic ( >> https://s.apache.org/eFOf), I'd like to put the proposal for a vote for >> the >> release cadence and EOL. The proposal is as follows: >> >> "A minor release line is end-of-lifed 2 years after it is released or >> there >> are 2 newer minor releases, whichever is sooner. The community reserves >> the >> right to extend or shorten the life of a release line if there is a good >> reason to do so." >> >> This also entails that we the Hadoop community commit to following this >> practice and solving challenges to make it possible. Andrew Wang laid out >> some of those challenges and what can be done in the discussion thread >> mentioned above. >> >> I'll set the voting period to 7 days. I understand a majority rule would >> apply in this case. Your vote is greatly appreciated, and so are >> suggestions! >> >> Thanks, >> Sangjin >> >> > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
Re: [VOTE] Release cadence and EOL
Thanks for driving this, Sangjin. Quick question, though: the subject line is "Release cadence and EOL," but I don't see anything about cadence in the proposal. Did I miss something? Daniel On 1/17/17 8:35 AM, Sangjin Lee wrote: Following up on the discussion thread on this topic ( https://s.apache.org/eFOf), I'd like to put the proposal for a vote for the release cadence and EOL. The proposal is as follows: "A minor release line is end-of-lifed 2 years after it is released or there are 2 newer minor releases, whichever is sooner. The community reserves the right to extend or shorten the life of a release line if there is a good reason to do so." This also entails that we the Hadoop community commit to following this practice and solving challenges to make it possible. Andrew Wang laid out some of those challenges and what can be done in the discussion thread mentioned above. I'll set the voting period to 7 days. I understand a majority rule would apply in this case. Your vote is greatly appreciated, and so are suggestions! Thanks, Sangjin - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6104) RegistrySecurity overrides zookeeper sasl system properties
Billie Rinaldi created YARN-6104: Summary: RegistrySecurity overrides zookeeper sasl system properties Key: YARN-6104 URL: https://issues.apache.org/jira/browse/YARN-6104 Project: Hadoop YARN Issue Type: Bug Reporter: Billie Rinaldi Assignee: Billie Rinaldi If the RM is configured with JAVA_OPTS setting the zookeeper.sasl.client.username and zookeeper.sasl.clientconfig properties, these are ignored and overwritten by RegistrySecurity in setZKSaslClientProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6103) Logging update for ZKRMStateStore
Bibin A Chundatt created YARN-6103: -- Summary: Logging update for ZKRMStateStore Key: YARN-6103 URL: https://issues.apache.org/jira/browse/YARN-6103 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Priority: Trivial {code} LOG.debug(appId + " znode didn't exist. Created a new znode to" + " update the application state."); {code} Check is debug enabled {code} if (LOG.isDebugEnabled()) { LOG.debug((isUpdate ? "Storing " : "Updating ") + dtSequenceNumberPath + ". SequenceNumber: " + rmDTIdentifier.getSequenceNumber()); } {code} isUpdate will be always false -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[VOTE] Release cadence and EOL
Following up on the discussion thread on this topic ( https://s.apache.org/eFOf), I'd like to put the proposal for a vote for the release cadence and EOL. The proposal is as follows: "A minor release line is end-of-lifed 2 years after it is released or there are 2 newer minor releases, whichever is sooner. The community reserves the right to extend or shorten the life of a release line if there is a good reason to do so." This also entails that we the Hadoop community commit to following this practice and solving challenges to make it possible. Andrew Wang laid out some of those challenges and what can be done in the discussion thread mentioned above. I'll set the voting period to 7 days. I understand a majority rule would apply in this case. Your vote is greatly appreciated, and so are suggestions! Thanks, Sangjin
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/ [Jan 16, 2017 7:11:53 AM] (lei) Revert "HDFS-11259. Update fsck to display maintenance state info. [Jan 16, 2017 9:45:22 PM] (arp) HDFS-11342. Fix FileInputStream leak in loadLastPartialChunkChecksum. [Jan 16, 2017 10:43:29 PM] (arp) HDFS-11339. Support File IO sampling for Datanode IO profiling hooks. [Jan 16, 2017 10:53:53 PM] (jitendra) HDFS-11307. The rpc to portmap service for NFS has hardcoded timeout. [Jan 17, 2017 12:20:24 AM] (junping_du) YARN-6011. Add a new web service to list the files on a container in [Jan 17, 2017 1:10:23 AM] (aajisaka) HADOOP-13933. Add haadmin -getAllServiceState option to get the HA state -1 overall The following subsystems voted -1: asflicense unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.yarn.server.timeline.webapp.TestTimelineWebServices hadoop.yarn.server.TestContainerManagerSecurity hadoop.yarn.server.TestMiniYarnClusterNodeUtilization cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-compile-javac-root.txt [168K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-checkstyle-root.txt [16M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-patch-shellcheck.txt [24K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-patch-shelldocs.txt [16K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/whitespace-eol.txt [11M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/diff-javadoc-javadoc-root.txt [2.2M] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt [324K] asflicense: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/289/artifact/out/patch-asflicense-problems.txt [4.0K] Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher
Ajith S created YARN-6102: - Summary: On failover RM can crash due to unregistered event to AsyncDispatcher Key: YARN-6102 URL: https://issues.apache.org/jira/browse/YARN-6102 Project: Hadoop YARN Issue Type: Bug Reporter: Ajith S Assignee: Ajith S Priority: Critical {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in dispatcher thread java.lang.Exception: No handler for registered for class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120) at java.lang.Thread.run(Thread.java:745) 2017-01-17 16:42:17,914 INFO [AsyncDispatcher ShutDown handler] event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code} The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits abnormally, after some analysis, i was able to reproduce. Once the nodeHeartBeat is sent to RM, inside {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}}, before sending it to dispatcher through {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} if RM failover is called, the dispatcher is reset The new dispatcher is however first started and then the events are registered at {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}} So event order will look like 1. Send Node heartbeat to {{ResourceTrackerService}} 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher call RM failover 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( {{resetDispatcher();}} + {{createAndInitActiveServices();}} ) Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher This will cause the above error as at point of time when {{STATUS_UPDATE}} event is given to dispatcher in {{ResourceTrackerService}} , the new dispatcher(from the failover) may be started but not yet registered for events Using same steps(with pausing JVM at debug), i was able to reproduce this in production cluster also. for {{STATUS_UPDATE}} active service event, when the service is yet to forward the event to RM dispatcher but a failover is called and dispatcher reset is between {{resetDispatcher();}} & {{createAndInitActiveServices();}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6101) Delay scheduling for node resource balance
He Tianyi created YARN-6101: --- Summary: Delay scheduling for node resource balance Key: YARN-6101 URL: https://issues.apache.org/jira/browse/YARN-6101 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: He Tianyi Priority: Minor We observed that, in today's cluster, usage of Spark has dramatically increased. This introduced a new issue that CPU/MEM utilization for single node may become imbalanced due to Spark is generally more memory intensive. For example, after a node with capability (48 cores, 192 GB memory) cannot satisfy a (1 core, 2 GB memory) request if current used resource is (20 cores, 190 GB memory), with plenty of total available resource across the whole cluster. A thought for avoiding the situation is to introduce some strategy during scheduling. This JIRA proposes a delay-scheduling-alike approach to achieve better balance between different type of resources on each node. The basic idea is consider dominant resource for each node, and when a scheduling opportunity on a particular node is offered to a resource request, better make sure the allocation is changing dominant resource of the node, or, in worst case, allocate at once when number of offered scheduling opportunities exceeds a certain number. With YARN SLS and a simulation file with hybrid workload (MR+Spark), the approach improved cluster resource usage by nearly 5%. And after deployed to production, we observed a 8% improvement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org