Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts in Kudu INSERTs ......................................................................
Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/7267/6//COMMIT_MSG Commit Message: Line 7: IMPALA-5498: Support for partial sorts > lets specify in Kudu for now Done PS6, Line 28: A > nit: This also adds... Done Line 32: used previously for inserts into Kudu. > "for inserts into Kudu"->"for DML statements into Kudu tables only. Future This only applies to inserts/upserts. We still haven't done the partitioning/sorting for update/delete. http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/exec/partial-sort-node.h File be/src/exec/partial-sort-node.h: PS6, Line 62: derived based > awkward/confusing Done PS6, Line 62: sort_exec_exprs_ > update Done http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: PS6, Line 1383: RunSize > can you indicate the units, e.g. #rows -> NumRowsPerRun? Done PS6, Line 1387: if (no_spill_) { : // 'min_buffers_required' is 1 since the sorter will only have 1 run at a time, so : // there won't be any merges. : min_buffers_required = 1; : } else { : min_buffers_required = MIN_BUFFERS_PER_MERGE; : } > nit: ternary operator Done http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS6, Line 45: Specifying 'no_spill' as true is > For this use case, 'no_spill' should be set to true so the sorter can reduc Done PS6, Line 103: no_spill > A positive argument would be marginally easier to follow. E.g. 'enable_spil Of course, I called it no_spill to mirror the function names. Do you think we should also change AddBatch/AddBatchNoSpill -> AddBatchWithSpill/AddBatch? PS6, Line 103: bool no_spill = false > comment on this even though it's mentioned above Done Line 117: /// up, it is sorted and a new unsorted run is created. > comment that this may not be called if no_spill is true Done PS6, Line 127: InputDone > comment that this may not be called if no_spill is true So this does have a DCHECK in it, but only if there's more than one run (which isn't possible unless AddBatch() was called), so its fine to call with either value of enable_spilling. The other cases would be the various private *Merge* functions. I didn't add DCHECKs/documentation to those cause it seems cluttered and the DCHECK here in InputDone() ensures that none of them are called, but I can add it in if you want. http://gerrit.cloudera.org:8080/#/c/7267/6/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: PS6, Line 56: TODO: run experiments to : // determine the value for this, consider making it configurable, enforce it in the BE. > JIRA? Done PS6, Line 124: { : return type_ != TSortType.PARTIAL; : } > nit: oneline? Done -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes