+0, as Daniel said we discussed this a lot off-list. Let's make sure the docs are up to snuff, and we update the site release notes to have a blurb on resource types.
Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down to the wire for the proposed RC0 schedule. On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <[email protected]> wrote: > After much offline discussion with Wangda, Sunil, Varun V., and Andrew > we've agreed that it would make sense to pull resource types into > branch-3.0 ahead of the Hadoop 3.0 RC0. Resource types has already been > merged into trunk/3.1. Now I'd like open a discussion about getting it > into 3.0 GA. Here's the run-down: > > Feature Details > --------------- > Resource types replaces the two primitives that tracked CPU and memory > with an array of objects to track an arbitrary set of resources (that must > always include CPU and memory). The resource manager reads the master list > of supported resources from its configs. The node managers read their > resource values from their configs and report them to the resource manager > in their heartbeats. The clients read the supported resource types from > their configs (or an RM service) and specify them in the application > submission. At a high level, nothing else changes. > > The Resource object is a core construct in the resource manager and > scheduler. All application operations end up touching Resource objects as > we determine fit or share-based priority for applications, queues, and > nodes. As this feature replaces the core of how Resource objects work, > resource types impacts almost every aspect of the resource manager's > operation. The change is pervasive, but not radical. > > The resource types patches as merged into trunk/3.1 include an additional > feature called resource profiles. Resource profiles are actually > independent of resource types, and either is useful without the other. The > resource profiles code is still in a bit of flux, so the current plan is to > pull only the resource types code into branch-3.0. I have backported only > the resource types patches into the resource-types branch. Unit tests are > passing, and I don't see any significant risk from the split. The diff > between the resource-types branch and branch-3.0 is available as a > branch-3.0 patch on YARN-7013[1]. > > Justification for 3.0 > --------------------- > Resource types (leaving out resource profiles) is in a stable state and is > well tested with unit tests, performance tests, and functional tests with > both the fair scheduler and the capacity scheduler. Tests were run on both > the resource-types branch and the original YARN-3926 branch. There is some > additional work to do, but none of it's critical (except maybe improving > the docs). Our confidence level in the feature is good. > > Resource types doesn't introduce incompatible changes to any Public and > Stable APIs. The are some incompatible changes to Public and Unstable > APIs, but that's what a major release is for. The Resource object proto > retains the CPU and memory fields and adds a new field for any additional > resource types to retain wire compatibility. Other proto changes are all > additive. > > While it's not possible to turn resource types off per se, if the user > does not activate the feature, the operation of YARN will be unchanged. > Getting this feature into Hadoop 3.0 gives us the required groundwork to > make progress on tidying up the usage details without having to drag in a > large set of invasive changes into 3.1. > > If we don't pull resource types into 3.0, it will open a persistent > channel through which failures can be introduced through backporting. The > differences introduced by resource types are significant enough that it > will be an issue for scheduler and resource manager patches between 3.1 and > 3.0. > > From the other side, resource types is a pervasive change, and there's no > turning it off. Users will be impacted by it regardless of whether they > choose to use it or not. While we've tested it, the feature represents a > large number of changes to core code that's critical to the resource > manager's operation. If we're going to introduce a large change like this, > no matter how well tested, we should do it in 3.0 where users already > expect some bumps in the road. Bringing in a large change like this in a > 3.1 release, when users expect the release to have stabilized, sounds like > a bad idea. > > > What do folks think about pulling resource types back into branch-3.0 in > time for RC0? Any concerns? > > Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, > Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew > Wang for their work on getting the resource types work done, backported, > tested, and on track for 3.0. > > [1]: https://issues.apache.org/jira/secure/attachment/12892456/ > YARN-7013.branch-3.0.002.patch >
