[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 11:31:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 06:51:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5297/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 06:51:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 03:52:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5134/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Nov 2019 23:44:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 9: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5293/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@86
PS8, Line 86: that in
> that are used in?
Done here and below


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@404
PS8, Line 404: ProbeState* next_state
> Might be nice to make this a 'bool* done' instead, to be consistent with th
Done


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@421
PS8, Line 421: probe_state_
> nit: 'probe_state_'
Done


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@57
PS8, Line 57: state_(HashJoinState::PARTITIONING_BUILD),
> Any reason not to just put this initialization the header?
Nope, moved the other constant initialisers there too.


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1134
PS8, Line 1134:   DCHECK_EQ(probe_batch_pos_, -1);
> DCHECK_ENUM_EQ(probe_state_, ProbeState::PROBING_END_BATCH)?}
Done


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1141
PS8, Line 1141: DCHECK
> DCHECK_ENUM_EQ
Done



--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Nov 2019 21:41:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14688

to look at the new patch set (#9).

Change subject: IMPALA-9127: explicit probe state machine in hash join
..

IMPALA-9127: explicit probe state machine in hash join

This refactors the main loop in PartitionedHashJoinNode::GetNext()
to use an explicit state machine, rather than the hard-to-follow
implicit state machine previously used. A new state variable
'probe_state_' is used to drive the loop, with DCHECKs enforcing
invariants of other member variables.

I deliberately tried to minimise changes to other functions
(including any attempts to factor logic out of GetNext())
to minimise the scope of this patch.

The new logic is mostly equivalent to the old logic, although there
may be a different number of trips through the loop because of the
way the cascading checks in the old version worked. A few notable
changes:
* DoneProbing() is consistently called when probing is finished,
  including in cases, like probing a single spilled partition, where
  it wasn't previously.
* The repeated AtCapacity() checks are consolidated into a single
  check that happens at the end of the loop. Resources attached
  to batches should still be flushed at the appropriate points,
  since each previous "if (out_batch->AtCapacity()) break;"
  corresponds to a new loop iteration in the new code.
* OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now
  explicitly signal when they are done using an output argument,
  instead of implicitly via AtCapacity(), which is incredibly
  error-prone.

Testing:
We have adequate coverage for different join modes, including
with spilling.

* Ran exhaustive tests.
* Ran a single node stress test with TPC-H and TPC-DS
* Ran a single node stress test with larger scale factor

Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
---
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
3 files changed, 363 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/9
--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 8: Code-Review+2

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@86
PS8, Line 86: that in
that are used in?


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@404
PS8, Line 404: ProbeState* next_state
Might be nice to make this a 'bool* done' instead, to be consistent with the 
methods below and to keep all of the places where we set the ProbeState 
completely contained in GetNext(). Not a big deal, though


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.h@421
PS8, Line 421: probe_state_
nit: 'probe_state_'


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@57
PS8, Line 57: state_(HashJoinState::PARTITIONING_BUILD),
Any reason not to just put this initialization the header?


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1134
PS8, Line 1134:   DCHECK_EQ(probe_batch_pos_, -1);
DCHECK_ENUM_EQ(probe_state_, ProbeState::PROBING_END_BATCH)?}


http://gerrit.cloudera.org:8080/#/c/14688/8/be/src/exec/partitioned-hash-join-node.cc@1141
PS8, Line 1141: DCHECK
DCHECK_ENUM_EQ



--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Nov 2019 18:50:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5111/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Nov 2019 01:57:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5110/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Nov 2019 01:23:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14688

to look at the new patch set (#8).

Change subject: IMPALA-9127: explicit probe state machine in hash join
..

IMPALA-9127: explicit probe state machine in hash join

This refactors the main loop in PartitionedHashJoinNode::GetNext()
to use an explicit state machine, rather than the hard-to-follow
implicit state machine previously used. A new state variable
'probe_state_' is used to drive the loop, with DCHECKs enforcing
invariants of other member variables.

I deliberately tried to minimise changes to other functions
(including any attempts to factor logic out of GetNext())
to minimise the scope of this patch.

The new logic is mostly equivalent to the old logic, although there
may be a different number of trips through the loop because of the
way the cascading checks in the old version worked. A few notable
changes:
* DoneProbing() is consistently called when probing is finished,
  including in cases, like probing a single spilled partition, where
  it wasn't previously.
* The repeated AtCapacity() checks are consolidated into a single
  check that happens at the end of the loop. Resources attached
  to batches should still be flushed at the appropriate points,
  since each previous "if (out_batch->AtCapacity()) break;"
  corresponds to a new loop iteration in the new code.
* OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now
  explicitly signal when they are done using an output argument,
  instead of implicitly via AtCapacity(), which is incredibly
  error-prone.

Testing:
We have adequate coverage for different join modes, including
with spilling.

* Ran exhaustive tests.
* Ran a single node stress test with TPC-H and TPC-DS
* Ran a single node stress test with larger scale factor

Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
---
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
3 files changed, 352 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/8
--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 8: Code-Review+1

PS7 was the wrong version, sorry about that


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Nov 2019 00:46:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 7: Code-Review+1

Rebased, carry +1


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Nov 2019 00:38:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14688

to look at the new patch set (#7).

Change subject: IMPALA-9127: explicit probe state machine in hash join
..

IMPALA-9127: explicit probe state machine in hash join

This refactors the main loop in PartitionedHashJoinNode::GetNext()
to use an explicit state machine, rather than the hard-to-follow
implicit state machine previously used. A new state variable
'probe_state_' is used to drive the loop, with DCHECKs enforcing
invariants of other member variables.

I deliberately tried to minimise changes to other functions
(including any attempts to factor logic out of GetNext())
to minimise the scope of this patch.

The new logic is mostly equivalent to the old logic, although there
may be a different number of trips through the loop because of the
way the cascading checks in the old version worked. A few notable
changes:
* DoneProbing() is consistently called when probing is finished,
  including in cases, like probing a single spilled partition, where
  it wasn't previously.
* The repeated AtCapacity() checks are consolidated into a single
  check that happens at the end of the loop. Resources attached
  to batches should still be flushed at the appropriate points,
  since each previous "if (out_batch->AtCapacity()) break;"
  corresponds to a new loop iteration in the new code.
* OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now
  explicitly signal when they are done using an output argument,
  instead of implicitly via AtCapacity(), which is incredibly
  error-prone.

Testing:
We have adequate coverage for different join modes, including
with spilling.

* Ran exhaustive tests.
* Ran a single node stress test with TPC-H and TPC-DS
* Ran a single node stress test with a debug action to
  force spilling:
  
--impalad_args="-default_query_options=DEBUG_ACTION=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5"

Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
---
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
3 files changed, 327 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/7
--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-21 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Nov 2019 20:56:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5066/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Nov 2019 18:22:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h@204
PS5, Line 204: beein
> typo
Done


http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc@522
PS5, Line 522:   case ProbeState::PROBING_IN_BATCH: {
> optional: I think that it would be nice to move this and the next case into
Done. It was pretty easy to extract some of the logic. I kept the logic 
relating to state transitions in this function so that the state machine is 
defined in this lop as much as possible.



--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Nov 2019 17:37:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-19 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Csaba Ringhofer, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14688

to look at the new patch set (#6).

Change subject: IMPALA-9127: explicit probe state machine in hash join
..

IMPALA-9127: explicit probe state machine in hash join

This refactors the main loop in PartitionedHashJoinNode::GetNext()
to use an explicit state machine, rather than the hard-to-follow
implicit state machine previously used. A new state variable
'probe_state_' is used to drive the loop, with DCHECKs enforcing
invariants of other member variables.

I deliberately tried to minimise changes to other functions
(including any attempts to factor logic out of GetNext())
to minimise the scope of this patch.

The new logic is mostly equivalent to the old logic, although there
may be a different number of trips through the loop because of the
way the cascading checks in the old version worked. A few notable
changes:
* DoneProbing() is consistently called when probing is finished,
  including in cases, like probing a single spilled partition, where
  it wasn't previously.
* The repeated AtCapacity() checks are consolidated into a single
  check that happens at the end of the loop. Resources attached
  to batches should still be flushed at the appropriate points,
  since each previous "if (out_batch->AtCapacity()) break;"
  corresponds to a new loop iteration in the new code.
* OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now
  explicitly signal when they are done using an output argument,
  instead of implicitly via AtCapacity(), which is incredibly
  error-prone.

Testing:
We have adequate coverage for different join modes, including
with spilling.

* Ran exhaustive tests.
* Ran a single node stress test with TPC-H and TPC-DS
* Ran a single node stress test with larger scale factor

Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
---
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
3 files changed, 354 insertions(+), 202 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/6
--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.h@204
PS5, Line 204: beein
typo


http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/14688/5/be/src/exec/partitioned-hash-join-node.cc@522
PS5, Line 522:   case ProbeState::PROBING_IN_BATCH: {
optional: I think that it would be nice to move this and the next case into 
different functions, e.g. ProcessCurrentProbeBatch() and NextProbeBatch() (and 
probably rename NextProbeRowBatch() to get NextProbeRowBatchFromChild() or 
something similar). The functions could be ordered like the old logic to reduce 
the diff.



--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Nov 2019 16:24:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 5:

This depends on https://gerrit.cloudera.org/#/c/14632/4


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Nov 2019 17:46:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5006/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Nov 2019 17:43:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9127: explicit probe state machine in hash join

2019-11-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/14688 )

Change subject: IMPALA-9127: explicit probe state machine in hash join
..

IMPALA-9127: explicit probe state machine in hash join

This refactors the main loop in PartitionedHashJoinNode::GetNext()
to use an explicit state machine, rather than the hard-to-follow
implicit state machine previously used. A new state variable
'probe_state_' is used to drive the loop, with DCHECKs enforcing
invariants of other member variables.

I deliberately tried to minimise changes to other functions
(including any attempts to factor logic out of GetNext())
to minimise the scope of this patch.

The new logic is mostly equivalent to the old logic, although there
may be a different number of trips through the loop because of the
way the cascading checks in the old version worked. A few notable
changes:
* DoneProbing() is consistently called when probing is finished,
  including in cases, like probing a single spilled partition, where
  it wasn't previously.
* The repeated AtCapacity() checks are consolidated into a single
  check that happens at the end of the loop. Resources attached
  to batches should still be flushed at the appropriate points,
  since each previous "if (out_batch->AtCapacity()) break;"
  corresponds to a new loop iteration in the new code.
* OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now
  explicitly signal when they are done using an output argument,
  instead of implicitly via AtCapacity(), which is incredibly
  error-prone.

Testing:
We have adequate coverage for different join modes, including
with spilling.

* Ran exhaustive tests.
* Ran a single node stress test with TPC-H and TPC-DS
* Ran a single node stress test with a debug action to
  force spilling:
  
--impalad_args="-default_query_options=DEBUG_ACTION=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5"

Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
---
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
3 files changed, 329 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/5
--
To view, visit http://gerrit.cloudera.org:8080/14688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064
Gerrit-Change-Number: 14688
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong