Folks, We (i.e. Microsoft) have started stabilization of 2.9 for our production deployment. During planning, we realized that we need to backport 3.x features to support GPUs (and more resource types like network IO) natively as part of the upgrade. We'd like to share that work with the community.
Instead of stabilizing the base release and cherry-picking fixes back to Apache, we want to work publicly and push fixes directly into trunk/.../branch-2 for a stable 2.10.0 release. Our goal is to create a bridge release for our production clusters to the 3.x series and to address scalability problems in large clusters (N*10k nodes). As we find issues, we will file JIRAs and track resolution of significant regressions/faults in wiki. Moreover, LinkedIn also has committed plans for a production deployment of the same branch. We welcome broad participation, particularly since we'll be stabilizing relatively new features. The exact list of features we would like to backport in YARN are: - Support for Resource types [1][2] - Native support for GPUs[3] - Absolute Resource configuration in CapacityScheduler [4] With regards to HDFS, we are currently looking at mainly fixes to Router based Federation and Windows specific fixes which should anyways flow normally. Thoughts? Thanks, Subru/Arun [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27786.html [2] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28281.html [3] https://issues.apache.org/jira/browse/YARN-6223 [4] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28772.html