Hi Daniel, Thanks for starting the thread and working on branch-3.0 merge efforts.
I'm in favor of bringing resource types in branch-3.0. Could you please share test you have done and performance numbers to compare branch-3.0 and branch-3.0 + resource types patches? I will +1 to the merge if we see similar performance after applying resource types patches comparing to trunk - Wangda On Thu, Oct 19, 2017 at 1:47 PM, Andrew Wang <[email protected]> wrote: > +0, as Daniel said we discussed this a lot off-list. > > Let's make sure the docs are up to snuff, and we update the site release > notes to have a blurb on resource types. > > Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down > to the wire for the proposed RC0 schedule. > > On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <[email protected]> > wrote: > > > After much offline discussion with Wangda, Sunil, Varun V., and Andrew > > we've agreed that it would make sense to pull resource types into > > branch-3.0 ahead of the Hadoop 3.0 RC0. Resource types has already been > > merged into trunk/3.1. Now I'd like open a discussion about getting it > > into 3.0 GA. Here's the run-down: > > > > Feature Details > > --------------- > > Resource types replaces the two primitives that tracked CPU and memory > > with an array of objects to track an arbitrary set of resources (that > must > > always include CPU and memory). The resource manager reads the master > list > > of supported resources from its configs. The node managers read their > > resource values from their configs and report them to the resource > manager > > in their heartbeats. The clients read the supported resource types from > > their configs (or an RM service) and specify them in the application > > submission. At a high level, nothing else changes. > > > > The Resource object is a core construct in the resource manager and > > scheduler. All application operations end up touching Resource objects > as > > we determine fit or share-based priority for applications, queues, and > > nodes. As this feature replaces the core of how Resource objects work, > > resource types impacts almost every aspect of the resource manager's > > operation. The change is pervasive, but not radical. > > > > The resource types patches as merged into trunk/3.1 include an additional > > feature called resource profiles. Resource profiles are actually > > independent of resource types, and either is useful without the other. > The > > resource profiles code is still in a bit of flux, so the current plan is > to > > pull only the resource types code into branch-3.0. I have backported > only > > the resource types patches into the resource-types branch. Unit tests > are > > passing, and I don't see any significant risk from the split. The diff > > between the resource-types branch and branch-3.0 is available as a > > branch-3.0 patch on YARN-7013[1]. > > > > Justification for 3.0 > > --------------------- > > Resource types (leaving out resource profiles) is in a stable state and > is > > well tested with unit tests, performance tests, and functional tests with > > both the fair scheduler and the capacity scheduler. Tests were run on > both > > the resource-types branch and the original YARN-3926 branch. There is > some > > additional work to do, but none of it's critical (except maybe improving > > the docs). Our confidence level in the feature is good. > > > > Resource types doesn't introduce incompatible changes to any Public and > > Stable APIs. The are some incompatible changes to Public and Unstable > > APIs, but that's what a major release is for. The Resource object proto > > retains the CPU and memory fields and adds a new field for any additional > > resource types to retain wire compatibility. Other proto changes are all > > additive. > > > > While it's not possible to turn resource types off per se, if the user > > does not activate the feature, the operation of YARN will be unchanged. > > Getting this feature into Hadoop 3.0 gives us the required groundwork to > > make progress on tidying up the usage details without having to drag in a > > large set of invasive changes into 3.1. > > > > If we don't pull resource types into 3.0, it will open a persistent > > channel through which failures can be introduced through backporting. > The > > differences introduced by resource types are significant enough that it > > will be an issue for scheduler and resource manager patches between 3.1 > and > > 3.0. > > > > From the other side, resource types is a pervasive change, and there's no > > turning it off. Users will be impacted by it regardless of whether they > > choose to use it or not. While we've tested it, the feature represents a > > large number of changes to core code that's critical to the resource > > manager's operation. If we're going to introduce a large change like > this, > > no matter how well tested, we should do it in 3.0 where users already > > expect some bumps in the road. Bringing in a large change like this in a > > 3.1 release, when users expect the release to have stabilized, sounds > like > > a bad idea. > > > > > > What do folks think about pulling resource types back into branch-3.0 in > > time for RC0? Any concerns? > > > > Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, > > Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew > > Wang for their work on getting the resource types work done, backported, > > tested, and on track for 3.0. > > > > [1]: https://issues.apache.org/jira/secure/attachment/12892456/ > > YARN-7013.branch-3.0.002.patch > > >
