[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Nov 2019 11:31:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Nov 2019 06:51:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5297/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Nov 2019 06:51:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Nov 2019 03:52:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5134/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Nov 2019 23:44:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 9: Code-Review+2 carry -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5293/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@86 PS8, Line 86: that in > that are used in? Done here and below http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@404 PS8, Line 404: ProbeState* next_state > Might be nice to make this a 'bool* done' instead, to be consistent with th Done http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@421 PS8, Line 421: probe_state_ > nit: 'probe_state_' Done http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@57 PS8, Line 57: state_(HashJoinState::PARTITIONING_BUILD), > Any reason not to just put this initialization the header? Nope, moved the other constant initialisers there too. http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1134 PS8, Line 1134: DCHECK_EQ(probe_batch_pos_, -1); > DCHECK_ENUM_EQ(probe_state_, ProbeState::PROBING_END_BATCH)?} Done http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1141 PS8, Line 1141: DCHECK > DCHECK_ENUM_EQ Done -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14688 to look at the new patch set (#9). Change subject: IMPALA-9127: explicit probe state machine in hash join .. IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with larger scale factor Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 363 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/9 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 8: Code-Review+2 (6 comments) http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@86 PS8, Line 86: that in that are used in? http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@404 PS8, Line 404: ProbeState* next_state Might be nice to make this a 'bool* done' instead, to be consistent with the methods below and to keep all of the places where we set the ProbeState completely contained in GetNext(). Not a big deal, though http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@421 PS8, Line 421: probe_state_ nit: 'probe_state_' http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@57 PS8, Line 57: state_(HashJoinState::PARTITIONING_BUILD), Any reason not to just put this initialization the header? http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1134 PS8, Line 1134: DCHECK_EQ(probe_batch_pos_, -1); DCHECK_ENUM_EQ(probe_state_, ProbeState::PROBING_END_BATCH)?} http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1141 PS8, Line 1141: DCHECK DCHECK_ENUM_EQ -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Nov 2019 18:50:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5111/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Nov 2019 01:57:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5110/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Nov 2019 01:23:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14688 to look at the new patch set (#8). Change subject: IMPALA-9127: explicit probe state machine in hash join .. IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with larger scale factor Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 352 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/8 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 8: Code-Review+1 PS7 was the wrong version, sorry about that -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Nov 2019 00:46:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 7: Code-Review+1 Rebased, carry +1 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Nov 2019 00:38:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14688 to look at the new patch set (#7). Change subject: IMPALA-9127: explicit probe state machine in hash join .. IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with a debug action to force spilling: --impalad_args="-default_query_options=DEBUG_ACTION=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5" Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 327 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/7 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Nov 2019 20:56:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 19 Nov 2019 18:22:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h@204 PS5, Line 204: beein > typo Done http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc@522 PS5, Line 522: case ProbeState::PROBING_IN_BATCH: { > optional: I think that it would be nice to move this and the next case into Done. It was pretty easy to extract some of the logic. I kept the logic relating to state transitions in this function so that the state machine is defined in this lop as much as possible. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 19 Nov 2019 17:37:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14688 to look at the new patch set (#6). Change subject: IMPALA-9127: explicit probe state machine in hash join .. IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with larger scale factor Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 354 insertions(+), 202 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/6 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h@204 PS5, Line 204: beein typo http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc@522 PS5, Line 522: case ProbeState::PROBING_IN_BATCH: { optional: I think that it would be nice to move this and the next case into different functions, e.g. ProcessCurrentProbeBatch() and NextProbeBatch() (and probably rename NextProbeRowBatch() to get NextProbeRowBatchFromChild() or something similar). The functions could be ordered like the old logic to reduce the diff. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 19 Nov 2019 16:24:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 5: This depends on https://gerrit.cloudera.org/#/c/14632/4 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Nov 2019 17:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5006/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 12 Nov 2019 17:43:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join
Tim Armstrong has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/14688 ) Change subject: IMPALA-9127: explicit probe state machine in hash join .. IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with a debug action to force spilling: --impalad_args="-default_query_options=DEBUG_ACTION=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5" Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 329 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/5 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong