Dan Hecht has posted comments on this change.
Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Patch Set 14:
Here's a first set of comments, mostly focused on phjn.h. I need to go back
through phjn.cc, builder.h/cc.
Line 207: state, Substitute(PREPARE_FAILED_ERROR_MSG, "write"));
old code and agg uses state_->block_mgr()->MemLimitTooLowError(). Any reason
this uses MemLimitExceeded() directly instead? Just wondering the reasoning.
Line 79: /// Logically, the algorithm has the following modes:.
I think this comment could be a lot more useful if we explain how the phases
work together to give the full algorithm. Probably that could replace the
state transition graph which I find confusing.
PS14, Line 81: partition them into hash_partitions_. The input can
this comment is kind of disjoint since it talks about what the phase does, then
where the input comes from, and then explains more about what the phase does.
Let's format each phase consistently, maybe describe inputs first, then
actions, then outputs?
PS14, Line 86: PROCESSING
why do we use PROCESSING term here rather than PARTITIONING, which would be
PS14, Line 87: spilled
(or remove single from #3 since having it in one place but not the other makes
the reader wonder if there's a difference).
Line 89: /// table in memory, we perform the join, otherwise we spill
the probe row.
this last sentence is confusing since it mostly restates what the first says
adds a small amount of new info. Let's combine them.
PS14, Line 90: PROBING_SPILLED_PARTITION
it would be good to be more upfront about when this phase is used and how, for
a spilled partition, we either do this phase or REPARTITIONING_PROBE phase.
PS14, Line 90: construct
what does this phase construct? isn't the hash table already constructed?
PS14, Line 91: walking
what does this mean? processing?
also, from these comments it's not clear how these stages relate. You could
read this to mean that #2 a fallback if we can't do #3, or that #3 is the
probing that "perform the join" of #2.
PS14, Line 97: *
what does this star mean to say? We can go from REPARTITIONING_PROBE back to
PS14, Line 143: buffer
PS14, Line 264: the
PS14, Line 267: the
PS14, Line 291: Walks
What does this mean. Iterates over?
PS14, Line 291: hash partitions
is this talking about the probe hash partitions or the builder's?
PS14, Line 384: builder_->hash_partitions
is this referring to the entries in the builder_->hash_partition_ array?
PS14, Line 385: This is not used when processing a single partition.
what does this mean?
PS14, Line 410: we then iterate over all the probe rows
I can't tell if this means literally all the probe rows, or all the rows in
this stream. clarify this paragraph.
PS14, Line 432: and
this makes it sound like there are two conditions required to create a probe
partition, but I think you mean this is implied by the first part. "because"?
PS14, Line 436: preallocated
what does this mean?
PS14, Line 437: and
double and. And is this saying the constructor prepares it or the caller
should have already? is this right:
... should be an unpinned stream that has been prepared for writing with ...
PS14, Line 443: should
PS14, Line 464: meaning
: /// it has to call Close() on it) but transferred to the
parent exec node
this doesn't seem right. don't we either Close() it or transfer it (not both),
PS14, Line 409: DCHECK
Line 412: children_.insert(children_.begin(), child_entry);
rather than having two special cases (here and line 395-399, how about making
AddChildLocked() take the position iterator? This case passes begin() and then
AddChild() case passes either end() or the middle iterator.
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>