Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts in Kudu INSERTs
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7267/6//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5498: Support for partial sorts
> lets specify in Kudu for now
Done


PS6, Line 28: A
> nit: This also adds...
Done


Line 32: used previously for inserts into Kudu.
> "for inserts into Kudu"->"for DML statements into Kudu tables only. Future 
This only applies to inserts/upserts. We still haven't done the 
partitioning/sorting for update/delete.


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/exec/partial-sort-node.h
File be/src/exec/partial-sort-node.h:

PS6, Line 62: derived based
> awkward/confusing
Done


PS6, Line 62: sort_exec_exprs_
> update
Done


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

PS6, Line 1383: RunSize
> can you indicate the units, e.g. #rows -> NumRowsPerRun?
Done


PS6, Line 1387:   if (no_spill_) {
              :     // 'min_buffers_required' is 1 since the sorter will only 
have 1 run at a time, so
              :     // there won't be any merges.
              :     min_buffers_required = 1;
              :   } else {
              :     min_buffers_required = MIN_BUFFERS_PER_MERGE;
              :   }
> nit: ternary operator
Done


http://gerrit.cloudera.org:8080/#/c/7267/6/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS6, Line 45: Specifying 'no_spill' as true is
> For this use case, 'no_spill' should be set to true so the sorter can reduc
Done


PS6, Line 103: no_spill
> A positive argument would be marginally easier to follow. E.g. 'enable_spil
Of course, I called it no_spill to mirror the function names. Do you think we 
should also change AddBatch/AddBatchNoSpill -> AddBatchWithSpill/AddBatch?


PS6, Line 103: bool no_spill = false
> comment on this even though it's mentioned above
Done


Line 117:   /// up, it is sorted and a new unsorted run is created.
> comment that this may not be called if no_spill is true
Done


PS6, Line 127: InputDone
> comment that this may not be called if no_spill is true
So this does have a DCHECK in it, but only if there's more than one run (which 
isn't possible unless AddBatch() was called), so its fine to call with either 
value of enable_spilling.

The other cases would be the various private *Merge* functions. I didn't add 
DCHECKs/documentation to those cause it seems cluttered and the DCHECK here in 
InputDone() ensures that none of them are called, but I can add it in if you 
want.


http://gerrit.cloudera.org:8080/#/c/7267/6/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

PS6, Line 56: TODO: run experiments to
            :   // determine the value for this, consider making it 
configurable, enforce it in the BE.
> JIRA?
Done


PS6, Line 124: {
             :     return type_ != TSortType.PARTIAL;
             :   }
> nit: oneline?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to