[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 20: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 20 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 23:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same way as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31% | -2.02 | -3.96 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 15.59
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5512/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 19 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 19:48:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 19: Code-Review+2 Added a DCHECK based on feedback from csaba, fixed the long line. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 19 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 19:01:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 18: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5484/ -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 18 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 19:02:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 20: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 20 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 19:02:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5486/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 20 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 19:02:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#19). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same way as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 18 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 18:49:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5507/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 16:46:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5484/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 18 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 16:02:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/15096/17/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/17/be/src/exec/partitioned-hash-join-builder.cc@188 PS17, Line 188: // the AddBarrierToCancel() mechanism ensures that cancellation happens after the overall line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 16:02:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 18: Code-Review+1 I'm going to carry as a +1. Just checking with Csaba if he's going to look again. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 18 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 16:01:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#17). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same way as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.h@650 PS16, Line 650: anr > nit: and Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc@708 PS15, Line 708: probe_barrier_->Wait > Agree this is an issue. I think there are multiple issues like this and I wanted to revisit the timing as a separate follow-up JIRA - IMPALA-9422. In this specific case I'm not sure if it's better to count the time against the builder or if that would be misleading. http://gerrit.cloudera.org:8080/#/c/15096/16/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/15096/16/fe/src/main/java/org/apache/impala/planner/JoinNode.java@193 PS16, Line 193: fragments > nit: fragment Done http://gerrit.cloudera.org:8080/#/c/15096/16/tests/query_test/test_spilling.py File tests/query_test/test_spilling.py: http://gerrit.cloudera.org:8080/#/c/15096/16/tests/query_test/test_spilling.py@150 PS16, Line 150: splitsb > nit: typo Done -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 16:01:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 16: Code-Review+2 (6 comments) http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.h@650 PS16, Line 650: anr nit: and http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc@708 PS15, Line 708: probe_barrier_->Wait > maybe add this wait time to the join node's idle/wait time. This would help http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc@268 PS15, Line 268: for (CyclicBarrier* cb2 : cancellation_cbs_) { : // Don't add if already present. : if (cb == cb2) return; : } > I figured that this would always be small enough that the linear search wou makes sense http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h File be/src/util/cyclic-barrier.h: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h@46 PS15, Line 46: template > This is actually a best practice, cause lambda functions have an "un-nameab Got it. Thanks for the explanation http://gerrit.cloudera.org:8080/#/c/15096/16/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/15096/16/fe/src/main/java/org/apache/impala/planner/JoinNode.java@193 PS16, Line 193: fragments nit: fragment http://gerrit.cloudera.org:8080/#/c/15096/16/tests/query_test/test_spilling.py File tests/query_test/test_spilling.py: http://gerrit.cloudera.org:8080/#/c/15096/16/tests/query_test/test_spilling.py@150 PS16, Line 150: splitsb nit: typo -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Mar 2020 03:14:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5495/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 14 Mar 2020 01:10:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/16/be/src/exec/partitioned-hash-join-builder.cc@188 PS16, Line 188: // the AddBarrierToCancel() mechanism ensures that cancellation happens after the overall line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 14 Mar 2020 00:25:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 15: (10 comments) http://gerrit.cloudera.org:8080/#/c/15096/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15096/15//COMMIT_MSG@42 PS15, Line 42: was > nit: way Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/join-builder.cc File be/src/exec/join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/join-builder.cc@123 PS15, Line 123: // Don't let probe side pick up the builder when we're going to clean it up. : // Query cancellation will propagate to the probe finstance. : ready_to_probe_ = !build_side_state->is_cancelled(); : VLOG(2) << "JoinBuilder (id=" << join_node_id_ << ")" : << " all probes complete. cancelled=" << build_side_state->is_cancelled() : << " outstanding_probes_=" << outstanding_probes_; > do we still need this? Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.h@648 PS15, Line 648: /// Calculates the amount of memory to be transferred for probe streams when probing > nit: per probe thread / join node instance Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc@182 PS15, Line 182: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-node.cc@1210 PS15, Line 1210: spilled_partitions_.emplace( : build_partition->id(), std::move(probe_hash_partitions_[i])); > nit: might be useful to mention in the comment for spilled_partitions_ in t Done http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc@268 PS15, Line 268: for (CyclicBarrier* cb2 : cancellation_cbs_) { : // Don't add if already present. : if (cb == cb2) return; : } > nit: would it be worth using an unordered_set? or will that be an overkill? I figured that this would always be small enough that the linear search wouldn't be an issue. Plus it avoids including unordered_set, etc in the header and dragging that in. http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h File be/src/util/cyclic-barrier.h: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h@46 PS15, Line 46: template > can we instead use a typedef to define the function signature in order to e This is actually a best practice, cause lambda functions have an "un-nameable" type that encodes all of the parameters, lambda captures, etc. If we used std::function the lambda would get converted to that less-efficient representation, which usually requires a heap allocation. https://www.drdobbs.com/cpp/efficient-use-of-lambda-expressions-and/232500059 http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/JoinNode.java@193 PS15, Line 193: fragments > nit: fragment instances Done http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@371 PS15, Line 371: One instance is scheduled per instance of the fragment > nit: slightly ambiguous because of the "instance" terminology, how about: " That's clearer. http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanNode.java@130 PS15, Line 130: // Estimated number of instances that the scheduler would generate for a fragment with > nit: across all nodes Done -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#16). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same way as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 15: (11 comments) Looks good, mostly nits, have yet to take a look at the tests http://gerrit.cloudera.org:8080/#/c/15096/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15096/15//COMMIT_MSG@42 PS15, Line 42: was nit: way http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/join-builder.cc File be/src/exec/join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/join-builder.cc@123 PS15, Line 123: // Don't let probe side pick up the builder when we're going to clean it up. : // Query cancellation will propagate to the probe finstance. : ready_to_probe_ = !build_side_state->is_cancelled(); : VLOG(2) << "JoinBuilder (id=" << join_node_id_ << ")" : << " all probes complete. cancelled=" << build_side_state->is_cancelled() : << " outstanding_probes_=" << outstanding_probes_; do we still need this? http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.h@648 PS15, Line 648: /// Calculates the amount of memory to be transferred for probe streams when probing nit: per probe thread / join node instance http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc@182 PS15, Line 182: nit: extra space http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-builder.cc@708 PS15, Line 708: probe_barrier_->Wait maybe add this wait time to the join node's idle/wait time. This would help debug issues where one thread will be holding up others http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/exec/partitioned-hash-join-node.cc@1210 PS15, Line 1210: spilled_partitions_.emplace( : build_partition->id(), std::move(probe_hash_partitions_[i])); nit: might be useful to mention in the comment for spilled_partitions_ in the header file when partitions in it can be empty. And/or update the method comment http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/runtime/runtime-state.cc@268 PS15, Line 268: for (CyclicBarrier* cb2 : cancellation_cbs_) { : // Don't add if already present. : if (cb == cb2) return; : } nit: would it be worth using an unordered_set? or will that be an overkill? http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h File be/src/util/cyclic-barrier.h: http://gerrit.cloudera.org:8080/#/c/15096/15/be/src/util/cyclic-barrier.h@46 PS15, Line 46: template can we instead use a typedef to define the function signature in order to enforce the constraint mentioned in the method comment? http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/JoinNode.java@193 PS15, Line 193: fragments nit: fragment instances http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@371 PS15, Line 371: One instance is scheduled per instance of the fragment nit: slightly ambiguous because of the "instance" terminology, how about: "One instance is scheduled per node, for all instances of the fragment containing the destination join node. http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/15/fe/src/main/java/org/apache/impala/planner/PlanNode.java@130 PS15, Line 130: // Estimated number of instances that the scheduler would generate for a fragment with nit: across all nodes -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 G
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 15: I forgot to mention, but PS15 is a non-WIP, i.e. it has all the tests I planned to add and has all the TODOs I had fixed. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 11 Mar 2020 18:32:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5462/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 09 Mar 2020 22:07:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5460/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 09 Mar 2020 22:02:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#15). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 14: More testing revealed that the cancellation approach didn't work right in all cases - a deadlock could happen with a thread in PhjBuilder::HandoffToProbesAndWait() and one or more, but not all of, the probe threads blocked on probe_barrier_. An external cancellation will not wake up any of these threads. Switched to using the same cancellation approach uniformly where RuntimeState::Cancel() cancels barriers as well as signalling condition variables. So if the coordinator cancels the query, it gets propagated. Or if one of the other threads on the backend fails, it unblocks the remaining threads. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 09 Mar 2020 21:23:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#14). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when cancelling fragments on the backend. BufferPool now correctly handles multiple threads calling CleanPages() concurrently, which makes other methods thread-safe. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/15096/13/be/src/runtime/bufferpool/buffer-pool.h File be/src/runtime/bufferpool/buffer-pool.h: http://gerrit.cloudera.org:8080/#/c/15096/13/be/src/runtime/bufferpool/buffer-pool.h@392 PS13, Line 392: /// This is safe to call concurrently from multiple threads, as long as those threads Spilling tests are failing intermittently because this isn't strictly true - sometimes CleanPages() doesn't clean enough pages if multiple threads call it concurrently for the builder. I'm gonna work on a unit test and a fix. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 21:49:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 11: PS9->PS12 adds some tests and fixes some accounting bugs that were discovered with the tests, specifically * THe spilled partitions got out of sync between the join nodes because of the empty probe side optimisation. I just disabled that optimisation because i don't think it's worth the complexity. * I didn't multiply the probe reservation by the number of threads in one place * num_spilled_probe_rows wasn't added together correctly across threads * NodeDebugString had an invalid dcheck that got hit when I set -vmodule=partitioned-hash-join-node=3 -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 00:57:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#12). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when closing the join builder. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31% | -2.02 | -3.96 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 15.59 | 17.02 | I -8.38% | 0.95% | 0.43%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#11). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when closing the join builder. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31% | -2.02 | -3.96 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 15.59 | 17.02 | I -8.38% | 0.95% | 0.43%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 10: (17 comments) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.h@444 PS10, Line 444: void IncrementNumSpilledProbeRows(int64_t count) { num_spilled_probe_rows_.Add(count); } line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@601 PS10, Line 601: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@603 PS10, Line 603: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@627 PS10, Line 627: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@629 PS10, Line 629: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@632 PS10, Line 632: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") transferring " << probe_reservation << " back to builder."; line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@712 PS10, Line 712: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") transferring " << probe_reservation << " back to builder."; line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@776 PS10, Line 776: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") transferring " << probe_reservation << " back to builder."; line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@940 PS10, Line 940: int64_t saved_probe_reservation = need_probe_buffer ? max_row_buffer_size_ * num_probe_threads_ : 0; line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@941 PS10, Line 941: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@944 PS10, Line 944: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@966 PS10, Line 966: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@969 PS10, Line 969: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") probe_stream_reservation_=" << probe_stream_reservation_.GetReservation(); line too long (125 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/be/src/exec/partitioned-hash-join-builder.cc@1051 PS10, Line 1051: VLOG(3) << "PHJ(node_id=" << join_node_id_ << ") transferring " << bytes << " back to builder."; line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/15096/10/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@201 PS10, Line 201: DataStreamSink streamSink = new DataStreamSink((ExchangeNode)destNode_, outputPartition_); line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@371 PS10, Line 371: // join. ParallelPlanner sets the destination fragment when adding the JoinBuildSink. line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/15096/10/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@400 PS10, Line 400: // join. ParallelPlanner sets the destination fragment when adding the JoinBuild
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#10). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when closing the join builder. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31% | -2.02 | -3.96 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 15.59 | 17.02 | I -8.38% | 0.95% | 0.43%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 01:28:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5451/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 01:38:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5450/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 01:37:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5454/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 02:43:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15096 to look at the new patch set (#13). Change subject: IMPALA-9156: share broadcast join builds .. IMPALA-9156: share broadcast join builds The scheduler will only create one join build finstance per backend in cases where this is supported. The builder is aware of the number of finstances executing the probe and hands off the build data structures to the builders. Nested loop join requires minimal modifications because the build data structures are read-only after initial construction. The only significant change is that memory can't be transferred to the multiple consumers, so MarkNeedsDeepCopy() needs to be used instead. Hash join requires additional synchronisation because the spilling algorithm mutates build-side data structures. This patch adds synchronisation so that rebuilding spilled partitions is done in a thread-safe manner, using a single thread. This uses the CyclicBarrier added in an earlier patch. Threads blocked on CyclicBarrier need to be cancellable, which is handled by cancelling the barrier when closing the join builder. Update planner to cost broadcast join and estimate memory consumption based on a single instance per node. Planner estimates of number of instances are improved. Instead of assuming mt_dop instances per node, use the total number of input splits (also called scan ranges in places) as an upper bound on the number of instances generated by scans. These instance estimates from the scan nodes are then propagated up the plan tree in the same was as the numNodes estimates. The instance estimate for the join build fragment is fixed to be based on the destination fragment. The profile now correctly accounts for time waiting for the builder, counting it in inactive time and showing it in the node timeline. Additional improvements/cleanup to the time accounting are deferring until IMPALA-9422. Testing: * Updated planner tests * Ran a single node stress test with TPC-H and TPC-DS * Add a targeted test for spilling broadcast joins, both repartitioning and not repartitioning. * Add a targeted test for a spilling broadcast join with empty probe * Add a targeted test for spilling broadcast join with empty build partitions. * Add a broadcast join to test_cancellation and test_failpoints. Perf: I did a single node run on my desktop: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.26| -15.70%| 4.63 | -16.16%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97 | 23.25 | R +7.38% | 0.51% | 0.22%| 5 | R +6.95% | 2.31| 27.93 | | TPCH(30) | TPCH-Q4 | parquet / none / none | 2.83 | 2.79| +1.31% | 1.86% | 0.36%| 5 | +1.88% | 1.15| 1.53| | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.28 | 1.28| -0.01% | 1.64% | 1.63%| 5 | -0.11% | -0.58 | -0.01 | | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65 | 2.68| -0.94% | 0.84% | 1.46%| 5 | -0.21% | -0.87 | -1.25 | | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.69 | 4.72| -0.56% | 1.29% | 0.52%| 5 | -1.04% | -1.15 | -0.89 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64 | 10.80 | -1.48% | 0.61% | 0.60%| 5 | -1.39% | -1.73 | -3.91 | | TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11 | 4.32| -4.92% | 0.05% | 0.40%| 5 | -4.93% | -2.31 | -27.46 | | TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47 | 3.67| I -5.41% | 0.81% | 0.03%| 5 | I -5.70% | -2.31 | -15.75 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58 | 8.14| I -6.93% | 3.13% | 2.62%| 5 | I -9.31% | -2.02 | -3.96 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 15.59 | 17.02 | I -8.38% | 0.95% | 0.43%
[Impala-ASF-CR] IMPALA-9156: share broadcast join builds
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 ) Change subject: IMPALA-9156: share broadcast join builds .. Patch Set 13: Rebased and fix an overly strict dcheck that was triggered by TestSpillingNoDebugActionDimensions::test_spilling_naaj_no_deny_reservation -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Mar 2020 01:59:10 + Gerrit-HasComments: No