My +1 (binding) brings us to three +1's and no -1's.  The vote is now closed, and the merge is approved.  I'll proceed with the merge.  The code should be in by this afternoon.

Daniel

On 10/28/17 9:39 AM, Sunil G wrote:
+1 (binding)

Thanks Daniel for helping to backport this. I also ran various performance test cases including mentioned UT perfs and SLS tests.

In SLS tests, I found that performance impact of branch-3.0 and resource-types branch is almost minimal. I tried to run test scenarios with 8k nodes and 4k nodes. There are no performance regressions seen when I used 2 resource types. I could get around 2800 container allocation per second in my machine with 8k nodes. Other than this I have also gone through the branch code and trunk. I could see that all major changes related to recent performance improvements are pulled in.

- Sunil


On Sat, Oct 28, 2017 at 8:20 PM Daniel Templeton <dan...@cloudera.com <mailto:dan...@cloudera.com>> wrote:

    As promised, here's the updated performance numbers.

    Performance reporting is always a tricky business.  I'll do my
    best here
    to fairly represent the state of things.  We've run a number of
    performance tests.  Those tests include TestCapacitySchedulerPref,
    SLS,
    and actual cluster testing.

    The summary is that in most scenarios, the resource-types branch
    is very
    close to branch-3.0 in performance.  There are some large scale SLS
    tests that show a performance drop, but that we have not been able to
    replicate those findings on an actual cluster.  Additional cluster
    testing is still in process.

    = TestCapacitySchedulerPerf =
    This unit test added with YARN-7136 does a tight loop over the
    scheduler's handling of node update events.  The net effect is similar
    to running 100 apps through 1 queue in a 2-node cluster.  I also
    modified it to run with fair scheduler and configured it with assign
    multiple enabled and set to the max containers supported by the
    cluster.

    - Capacity scheduler -
    Performance of resource-types v/s branch-3.0: 1.0 (no change)
    Performance of resource-types v/s trunk: 1.16 (16% *better*)
    - Fair scheduler -
    Performance of resource-types + YARN-7374 v/s branch-3.0: 1.25 (25%
    *better*)
    Performance of resource-types + YARN-7374 v/s trunk: 1.04 (4%
    *better*)

    These results seem a little optimistic when compared with the SLS
    results, but at worst they provide evidence that the resource types
    changes do not have a significant negative impact.

    Wangda and Sunil did some independent testing with this unit test and
    found no significant difference between branch-3.0 and resource-types.

    = SLS =
    For SLS, we tested a wide range of scenarios with different node, app,
    task, and queue counts.  We ran these tests for capacity and fair
    scheduler.

    The net result is that for the majority of the scenarios we
    tested, the
    resource-types branch performance was within 95% to 105% of branch-3.0
    performance.  We looked at the numbers for only the allocation
    time and
    node update event processing time, as the other numbers returned
    by SLS
    are not relevant here.  I'm not reporting specific numbers because of
    the volume of tests run, and because reporting any kind of aggregate
    result would be inherently skewed by the mix of tests we chose to run,
    and hence would be misleading.

    There were a few large node count+large queue count+large app count
    scenarios where resource-types showed a larger performance degradation
    versus branch-3.0 when comparing mean node update time over the entire
    run.  Mean is a lossy metric here, as we're trying to summarize an
    entire time series in a single number, but it's about the best we're
    gonna do.  While these results aren't encouraging, bear in mind that
    they are specifically for the time to process a node update, which
    does
    not necessarily translate directly into overall cluster performance.

    Wangda and Sunil did some independent testing with SLS and found no
    significant difference between branch-3.0 and resource-types.

    = Cluster Testing =
    Because of the large SLS scenarios that showed a performance
    degradation, we have done performance testing on actual clusters.
    These
    tests are still ongoing, but thus far the results have shown no
    discernible difference in overall throughput between branch-3.0 and
    resource-types.  Overall throughput for both branches falls into
    identically the same range.

    Daniel

    On 10/24/17 10:56 AM, Daniel Templeton wrote:
    > I'd like to formally start the voting process for merging the
    > resource-types branch into branch-3.0.  The resource-types
    branch is a
    > selective backport of JIRAs that were already merged into trunk in a
    > previous merge vote for YARN-3926 (resource types) [1]. For a full
    > explanation of the feature, benefits, and risks, see the previous
    > DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct
    31 at
    > 11:00AM PDT.
    >
    > In summary, resource types adds the ability to declaratively
    configure
    > new resource types in addition to CPU and memory and request
    them when
    > submitting resource requests.  The resource-types branch currently
    > represents 32 patches from trunk drawn from the resource types
    > umbrella JIRAs: YARN-3926 [3] and YARN-7069 [4].
    >
    > Key points:
    > * If no additional resource types are configured, the user
    experience
    > with YARN remains unchanged.
    > * Performance is the primary risk. We have been closely watching the
    > performance impact of adding resource types, and according to
    current
    > measurements the impact is trivial.
    > * This merge vote is for resource types excluding the resource
    > profiles feature which was included in the original merge vote [1].
    > * Documentation is available in trunk via YARN-7056 [5] with
    > improvements pending review in YARN-7369 [6].
    >
    > Refreshed performance numbers on the resource-types branch are
    > pending, and I'll post them to this thread as soon as they're ready.
    >
    > Thanks!
    > Daniel
    >
    > [1]
    >
    
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3ccad++ecm6xss4_kxp4audf85_rgg4pzxkuox7u2vp8tfzmy4...@mail.gmail.com%3E
    > [2]
    >
    
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E
    > [3] https://issues.apache.org/jira/browse/YARN-3926
    > [4] https://issues.apache.org/jira/browse/YARN-7069
    > [5] https://issues.apache.org/jira/browse/YARN-7056
    > [6] https://issues.apache.org/jira/browse/YARN-7369


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
    <mailto:yarn-dev-unsubscr...@hadoop.apache.org>
    For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
    <mailto:yarn-dev-h...@hadoop.apache.org>


Reply via email to