Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements ......................................................................
Patch Set 4: (24 comments) Thanks for the comments, please see PS4. As pointed out in one of the replies I will look at Kudu support next. http://gerrit.cloudera.org:8080/#/c/4745/2//COMMIT_MSG Commit Message: Line 11: plan, just before the table sink. This has the effect that data will be > -distributed (also adds to the single-node plan) Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 121: private boolean hasClusteredHint_ = false; > but how do you capture 'noclustered'? that's not the same as the absence of Right now the default is 'noclustered' and is not captured explicitly. Once the default changes to 'clustered' we can change the code to reflect that. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 238: */ > move this TODO into SortInfo.createSortTupleInfo() Done Line 253: Preconditions.checkState(smap.getLhs().get(i) instanceof SlotRef); > these few lines can also be moved into createSortTupleInfo() right? Excellent, I didn't see this. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 125: > refer to the new tuple materialized by this sort instead of the original in Done Line 145: public SortInfo clone() { return new SortInfo(this); } > shorter: by the sort node Done Line 146: > comment about result expressions only makes sense if we pass in the result True, now it's aligned with the resultExprs actually being passed in. Line 147: /** > we also do this for top-n Done Line 152: * sorts. The substitution map is returned. > I'd suggest removing the 'stmt' parameter and instead returning the ExprEub Done Line 153: * TODO: We could do something more sophisticated than simply copying input slot refs - > Isn't this the tuple descriptor for the sort output? The secnd sentence is I think it is both, since the sort node operates only on these. However output seems to be more intuitive, so I changed it. Line 163: > builds up the tuple operated on and returned by sort node Done Line 171: // substOrderBy is the mapping from slot refs in the sort node's input to slot refs in > I think the else case may not be needed/possible, but the explanation is th Yes, thanks for the explanation. Line 173: // the tuple operated on and returned by the sort node. > Can we also move this to the caller? The returned smap can be used to creat Done Line 181: sortTupleExprs.add(origSlotRef); > let's let the caller make this decision without passing a flag Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 275: private PlanFragment createScanFragment(PlanNode node) { > never mind, the structure looks good as it is. Ok. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 149: // Add optional sort node to the plan, based on clustered/noclustered plan hint. > Let's keep it concrete: we're adding a sort node Done Line 151: // set up table sink for root fragment > there's no ClusteringNode. something like 'clusterInsertInput'/ 'createClus Done Line 497: * This will sort the data produced by 'inputFragment' by the clustering columns, so > Mention the ordering of the sort Done Line 498: * that partitions can be written sequentially in the table sink. > Second sentence is kind of vague. I understand that sorting is just an inst Done Line 503: if (!insertStmt.hasClusteredHint()) return; > single line (and elsewhere) Done Line 508: > partitioning? I took that comment from DistributedPlanner::createInsertFragment() and I thought it made sense to remove all constants during the clustering. Changed the word. http://gerrit.cloudera.org:8080/#/c/4745/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 573: # IMPALA-2521: clustered insert into partitioned table adds sort node. > Let's test these combinations: Done Line 575: select * from functional.alltypes > Let's also test Kudu tables because there we want to sort on the PKs. Let's I'll try to address this in a subsequent patchset, so we can make progress on the rest. Line 576: ---- PLAN > also add ---- PLAN section Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-HasComments: Yes