Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0

Sunil G Sat, 28 Oct 2017 09:40:25 -0700

+1 (binding)

Thanks Daniel for helping to backport this. I also ran various performance
test cases including mentioned UT perfs and SLS tests.


In SLS tests, I found that performance impact of branch-3.0 and
resource-types branch is almost minimal. I tried to run test scenarios with
8k nodes and 4k nodes. There are no performance regressions seen when I
used 2 resource types. I could get around 2800 container allocation per
second in my machine with 8k nodes. Other than this I have also gone
through the branch code and trunk. I could see that all major changes
related to recent performance improvements are pulled in.

- Sunil


On Sat, Oct 28, 2017 at 8:20 PM Daniel Templeton <dan...@cloudera.com>
wrote:

> As promised, here's the updated performance numbers.
>
> Performance reporting is always a tricky business.  I'll do my best here
> to fairly represent the state of things.  We've run a number of
> performance tests.  Those tests include TestCapacitySchedulerPref, SLS,
> and actual cluster testing.
>
> The summary is that in most scenarios, the resource-types branch is very
> close to branch-3.0 in performance.  There are some large scale SLS
> tests that show a performance drop, but that we have not been able to
> replicate those findings on an actual cluster.  Additional cluster
> testing is still in process.
>
> = TestCapacitySchedulerPerf =
> This unit test added with YARN-7136 does a tight loop over the
> scheduler's handling of node update events.  The net effect is similar
> to running 100 apps through 1 queue in a 2-node cluster.  I also
> modified it to run with fair scheduler and configured it with assign
> multiple enabled and set to the max containers supported by the cluster.
>
> - Capacity scheduler -
> Performance of resource-types v/s branch-3.0: 1.0 (no change)
> Performance of resource-types v/s trunk: 1.16 (16% *better*)
> - Fair scheduler -
> Performance of resource-types + YARN-7374 v/s branch-3.0: 1.25 (25%
> *better*)
> Performance of resource-types + YARN-7374 v/s trunk: 1.04 (4% *better*)
>
> These results seem a little optimistic when compared with the SLS
> results, but at worst they provide evidence that the resource types
> changes do not have a significant negative impact.
>
> Wangda and Sunil did some independent testing with this unit test and
> found no significant difference between branch-3.0 and resource-types.
>
> = SLS =
> For SLS, we tested a wide range of scenarios with different node, app,
> task, and queue counts.  We ran these tests for capacity and fair
> scheduler.
>
> The net result is that for the majority of the scenarios we tested, the
> resource-types branch performance was within 95% to 105% of branch-3.0
> performance.  We looked at the numbers for only the allocation time and
> node update event processing time, as the other numbers returned by SLS
> are not relevant here.  I'm not reporting specific numbers because of
> the volume of tests run, and because reporting any kind of aggregate
> result would be inherently skewed by the mix of tests we chose to run,
> and hence would be misleading.
>
> There were a few large node count+large queue count+large app count
> scenarios where resource-types showed a larger performance degradation
> versus branch-3.0 when comparing mean node update time over the entire
> run.  Mean is a lossy metric here, as we're trying to summarize an
> entire time series in a single number, but it's about the best we're
> gonna do.  While these results aren't encouraging, bear in mind that
> they are specifically for the time to process a node update, which does
> not necessarily translate directly into overall cluster performance.
>
> Wangda and Sunil did some independent testing with SLS and found no
> significant difference between branch-3.0 and resource-types.
>
> = Cluster Testing =
> Because of the large SLS scenarios that showed a performance
> degradation, we have done performance testing on actual clusters. These
> tests are still ongoing, but thus far the results have shown no
> discernible difference in overall throughput between branch-3.0 and
> resource-types.  Overall throughput for both branches falls into
> identically the same range.
>
> Daniel
>
> On 10/24/17 10:56 AM, Daniel Templeton wrote:
> > I'd like to formally start the voting process for merging the
> > resource-types branch into branch-3.0.  The resource-types branch is a
> > selective backport of JIRAs that were already merged into trunk in a
> > previous merge vote for YARN-3926 (resource types) [1].  For a full
> > explanation of the feature, benefits, and risks, see the previous
> > DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct 31 at
> > 11:00AM PDT.
> >
> > In summary, resource types adds the ability to declaratively configure
> > new resource types in addition to CPU and memory and request them when
> > submitting resource requests.  The resource-types branch currently
> > represents 32 patches from trunk drawn from the resource types
> > umbrella JIRAs: YARN-3926 [3] and YARN-7069 [4].
> >
> > Key points:
> > * If no additional resource types are configured, the user experience
> > with YARN remains unchanged.
> > * Performance is the primary risk. We have been closely watching the
> > performance impact of adding resource types, and according to current
> > measurements the impact is trivial.
> > * This merge vote is for resource types excluding the resource
> > profiles feature which was included in the original merge vote [1].
> > * Documentation is available in trunk via YARN-7056 [5] with
> > improvements pending review in YARN-7369 [6].
> >
> > Refreshed performance numbers on the resource-types branch are
> > pending, and I'll post them to this thread as soon as they're ready.
> >
> > Thanks!
> > Daniel
> >
> > [1]
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3ccad++ecm6xss4_kxp4audf85_rgg4pzxkuox7u2vp8tfzmy4...@mail.gmail.com%3E
> > [2]
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E
> > [3] https://issues.apache.org/jira/browse/YARN-3926
> > [4] https://issues.apache.org/jira/browse/YARN-7069
> > [5] https://issues.apache.org/jira/browse/YARN-7056
> > [6] https://issues.apache.org/jira/browse/YARN-7369
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>

Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0

Reply via email to