Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15926 )
Change subject: IMPALA-9655: Dynamic intra-node load balancing for HDFS scans ...................................................................... IMPALA-9655: Dynamic intra-node load balancing for HDFS scans This patch ameliorates intra node execution skew for multithreaded HDFS scans by implementing a shared queue of scan ranges for all instances. Some points to highlight: - The scan node's PlanNode will go through all the TScanRanges assigned to all instances and aggregate them into file descriptors. - These files will be statically and evenly distributed among the instances. - The instances would then pick up their set of files and issue initial ranges by adding them to a shared queue. - Instances would then fetch their next range to read from this shared queue. - Other relevant data structures that will also be shared are: * remaining_scan_range_submissions_ * partition_template_tuple_map_ * per_file_metadata_ - Removed the longest-processing time (LPT) algorithm in the scheduler which tries to distribute scan load among instances on a host. This will have no effect after this patch since now the ranges are shared. - Added missing lifecycle events from MT scan nodes. This approach guarantees the following: - All shared data structures are allocated at the FragmentState level ensuring they'll stick around till the query is closed. - All instance local buffers for reading a scan range will be allocated after it has been taken off the shared queue. - Since the scheduler can assign ranges from the same file to different instances, aggregating them into file descriptors early will ensure the initial footer or header range (depending on file format) for it will only be issued once. Limitation: - For HDFS MT scans we lose out on the optimization within ReaderContext which ensures cached ranges are read first. - As a compromise, ranges that are marked to use the hdfs cache are added to the front of the shared queue. Testing: - Added a regression test in test_mt_dop which would almost always fail without this patch - Added some test coverage for mt scans where parquet files are split across blocks and scanning tables with mixed file formats. - Ran core tests on ASAN build - Ran exhaustive tests Performance: No significant perf change. Ran TPCH (scale 30) with mt_dop set to 4 on a 3 node minicluster on my desktop. +----------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +----------+-----------------------+---------+------------+------------+----------------+ | TPCH(30) | parquet / none / none | 5.85 | +0.35% | 4.35 | +0.60% | +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+ | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.46 | 4.30 | +3.63% | 1.64% | 2.23% | 30 | +3.56% | 5.28 | 7.09 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 9.51 | 9.23 | +3.12% | 1.19% | 0.79% | 30 | +2.84% | 6.48 | 11.69 | | TPCH(30) | TPCH-Q2 | parquet / none / none | 2.37 | 2.32 | +1.89% | 3.76% | 3.26% | 30 | +2.16% | 1.97 | 2.06 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 16.84 | 16.55 | +1.77% | 1.24% | 0.57% | 30 | +1.62% | 5.75 | 6.99 | | TPCH(30) | TPCH-Q19 | parquet / none / none | 3.65 | 3.60 | +1.43% | 2.19% | 1.72% | 30 | +1.39% | 2.53 | 2.79 | | TPCH(30) | TPCH-Q18 | parquet / none / none | 12.64 | 12.48 | +1.33% | 1.51% | 1.53% | 30 | +1.28% | 2.67 | 3.36 | | TPCH(30) | TPCH-Q14 | parquet / none / none | 2.94 | 2.91 | +1.03% | 1.91% | 1.77% | 30 | +1.48% | 1.98 | 2.16 | | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.31 | 1.29 | +1.44% | 5.48% | 3.66% | 30 | +0.43% | 1.10 | 1.18 | | TPCH(30) | TPCH-Q7 | parquet / none / none | 5.49 | 5.44 | +0.78% | 0.59% | 1.16% | 30 | +0.95% | 3.49 | 3.27 | | TPCH(30) | TPCH-Q8 | parquet / none / none | 5.40 | 5.35 | +0.83% | 0.95% | 0.95% | 30 | +0.83% | 2.81 | 3.37 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 2.82 | 2.81 | +0.50% | 1.58% | 1.72% | 30 | +0.30% | 1.22 | 1.16 | | TPCH(30) | TPCH-Q11 | parquet / none / none | 1.43 | 1.42 | +0.64% | 2.42% | 1.96% | 30 | +0.09% | 0.68 | 1.13 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.59 | 2.58 | +0.28% | 1.54% | 1.99% | 30 | +0.23% | 1.17 | 0.60 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.17 | 2.17 | +0.15% | 3.04% | 3.68% | 30 | +0.15% | 0.30 | 0.18 | | TPCH(30) | TPCH-Q16 | parquet / none / none | 2.02 | 2.02 | +0.21% | 2.73% | 2.85% | 30 | +0.07% | 0.26 | 0.30 | | TPCH(30) | TPCH-Q5 | parquet / none / none | 4.16 | 4.15 | +0.04% | 4.49% | 4.38% | 30 | -0.03% | -0.13 | 0.03 | | TPCH(30) | TPCH-Q3 | parquet / none / none | 4.75 | 4.76 | -0.28% | 1.34% | 1.47% | 30 | -0.10% | -0.57 | -0.77 | | TPCH(30) | TPCH-Q10 | parquet / none / none | 4.92 | 4.95 | -0.52% | 1.45% | 2.71% | 30 | +0.03% | 0.12 | -0.94 | | TPCH(30) | TPCH-Q12 | parquet / none / none | 2.82 | 2.85 | -0.78% | 1.50% | 1.68% | 30 | -0.65% | -1.66 | -1.91 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.49 | 4.53 | -0.97% | 2.26% | 2.69% | 30 | -0.68% | -0.99 | -1.52 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 10.11 | 10.19 | -0.85% | 3.27% | 3.71% | 30 | -0.84% | -0.85 | -0.94 | | TPCH(30) | TPCH-Q21 | parquet / none / none | 21.75 | 22.28 | -2.37% | 3.51% | 7.00% | 30 | -0.88% | -1.35 | -1.66 | +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+ Change-Id: I9a101d0d98dff6e3779f85bc466e4c0bdb38094b Reviewed-on: http://gerrit.cloudera.org:8080/15926 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M tests/query_test/test_mt_dop.py M tests/query_test/test_scanners.py 19 files changed, 702 insertions(+), 415 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9a101d0d98dff6e3779f85bc466e4c0bdb38094b Gerrit-Change-Number: 15926 Gerrit-PatchSet: 11 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>