Hello Lars Volker, Bikramjeet Vig, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14381
to look at the new patch set (#4).
Change subject: IMPALA-9015: improve mt_dop scan scheduling
......................................................................
IMPALA-9015: improve mt_dop scan scheduling
Implement longest-processing time algorithm for assigning scan ranges
to instances within a host. This is a standard algorithm that works
well in practice and solves some specific bugs in the current
algorithm.
The previous approach tended to assign multiple ranges to the
first instance and induce skew. E.g. if the ranges were
[3, 4, 5, 6] and it had 4 instances, it would assign
[3, 4], [5], [6], []. This also had the unfortunate consequence
that not all instances actually got allocated scan ranges,
making scheduling hard to reason about.
Testing:
Updated admission test to reflect this change.
Added a unit test for the core algorithm that checks directly
that it fixes that above problems.
Perf:
The algorithm is O(n log n) instead of O(n), where n is the
number of scan ranges allocated to a backend. This seems
worthwhile to get more even work distribution because
the payoff can be significant in may cases.
I ran a single node perf run of TPC-H scale 30 with
mt_dop=4 and saw significant perf improvements:
Report Generated on 2019-10-07
Run Description: "ccd741856d0e18be9c89087da71d9fc59f1c75ad vs
37d3df99e588b0e9080660c472c3bb06a08361ac"
Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version: impalad version 3.4.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 3.4.0-SNAPSHOT RELEASE ()
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 8.51 | -3.78% | 5.70 | -4.07%
|
+----------+-----------------------+---------+------------+------------+----------------+
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) |
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval |
Tval |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q11 | parquet / none / none | 3.08 | 3.04 | +1.11%
| 0.72% | 1.09% | 5 | +1.52% | 0.58 | 1.88 |
| TPCH(30) | TPCH-Q2 | parquet / none / none | 1.83 | 1.83 | -0.01%
| 3.68% | 1.25% | 5 | +2.00% | 0.00 | -0.00 |
| TPCH(30) | TPCH-Q9 | parquet / none / none | 36.29 | 36.17 | +0.32%
| 2.22% | 2.94% | 5 | +0.56% | 0.00 | 0.20 |
| TPCH(30) | TPCH-Q8 | parquet / none / none | 5.80 | 5.93 | -2.06%
| 0.61% | 0.94% | 5 | -1.74% | -2.02 | -4.14 |
| TPCH(30) | TPCH-Q3 | parquet / none / none | 8.91 | 9.15 | -2.62%
| 3.54% | 3.96% | 5 | -2.17% | -0.87 | -1.12 |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.38 | 3.46 | -2.30%
| 0.82% | 0.68% | 5 | -2.54% | -2.31 | -4.89 |
| TPCH(30) | TPCH-Q6 | parquet / none / none | 1.62 | 1.67 | -2.73%
| 2.52% | 0.12% | 5 | -2.88% | -1.15 | -2.49 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.67 | 3.78 | -3.06%
| 1.12% | 1.69% | 5 | -2.69% | -2.31 | -3.41 |
| TPCH(30) | TPCH-Q4 | parquet / none / none | 3.08 | 3.18 | -3.05%
| 1.22% | 0.32% | 5 | -2.86% | -2.31 | -5.57 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 5.52 | 5.72 | -3.56%
| 1.35% | 1.02% | 5 | -3.60% | -2.31 | -4.81 |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.32 | 2.41 | -3.86%
| 1.04% | 1.64% | 5 | -3.99% | -2.02 | -4.50 |
| TPCH(30) | TPCH-Q7 | parquet / none / none | 24.22 | 25.27 | -4.15%
| 0.40% | 2.01% | 5 | -3.74% | -2.31 | -4.55 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 8.64 | 8.99 | -3.97%
| 1.02% | 0.47% | 5 | -4.27% | -2.31 | -8.15 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.38 | 3.53 | -4.28%
| 1.67% | 1.39% | 5 | -4.26% | -2.31 | -4.51 |
| TPCH(30) | TPCH-Q5 | parquet / none / none | 11.77 | 12.28 | -4.18%
| 0.72% | 0.94% | 5 | -4.62% | -2.31 | -8.06 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 7.65 | 8.05 | -4.98%
| 1.03% | 2.09% | 5 | -4.68% | -2.31 | -4.82 |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 27.97 | 29.52 | I -5.27%
| 0.78% | 0.39% | 5 | I -5.16% | -2.31 | -14.17 |
| TPCH(30) | TPCH-Q1 | parquet / none / none | 4.92 | 5.24 | I -6.05%
| 3.59% | 2.32% | 5 | I -6.79% | -1.73 | -3.30 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.56 | 4.92 | I -7.33%
| 1.55% | 0.90% | 5 | I -8.50% | -2.31 | -9.64 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.09 | 2.28 | I -8.40%
| 1.95% | 1.85% | 5 | I -9.21% | -2.31 | -7.29 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.83 | 11.84 | I -8.55%
| 0.53% | 0.57% | 5 | I -9.34% | -2.31 | -25.56 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.63 | 6.22 | I -9.53%
| 0.75% | 0.73% | 5 | I -10.46% | -2.31 | -21.37 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
Change-Id: I45ed2dab835efeb64bb74891cb43065894892682
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M
testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-admission-slots.test
4 files changed, 185 insertions(+), 60 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/14381/4
--
To view, visit http://gerrit.cloudera.org:8080/14381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45ed2dab835efeb64bb74891cb43065894892682
Gerrit-Change-Number: 14381
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>