Lars Volker has posted comments on this change.

Change subject: IMPALA-2521: Add clustered hint to insert statements
......................................................................


Patch Set 4:

(24 comments)

Thanks for the comments, please see PS4. As pointed out in one of the replies I 
will look at Kudu support next.

http://gerrit.cloudera.org:8080/#/c/4745/2//COMMIT_MSG
Commit Message:

Line 11: plan, just before the table sink. This has the effect that data will be
> -distributed (also adds to the single-node plan)
Done


http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 121:   private boolean hasClusteredHint_ = false;
> but how do you capture 'noclustered'? that's not the same as the absence of
Right now the default is 'noclustered' and is not captured explicitly. Once the 
default changes to 'clustered' we can change the code to reflect that.


http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

Line 238:    */
> move this TODO into SortInfo.createSortTupleInfo()
Done


Line 253:       Preconditions.checkState(smap.getLhs().get(i) instanceof 
SlotRef);
> these few lines can also be moved into createSortTupleInfo() right?
Excellent, I didn't see this.


http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 125: 
> refer to the new tuple materialized by this sort instead of the original in
Done


Line 145:   public SortInfo clone() { return new SortInfo(this); }
> shorter: by the sort node
Done


Line 146: 
> comment about result expressions only makes sense if we pass in the result 
True, now it's aligned with the resultExprs actually being passed in.


Line 147:   /**
> we also do this for top-n
Done


Line 152:    * sorts. The substitution map is returned.
> I'd suggest removing the 'stmt' parameter and instead returning the ExprEub
Done


Line 153:    * TODO: We could do something more sophisticated than simply 
copying input slot refs -
> Isn't this the tuple descriptor for the sort output? The secnd sentence is 
I think it is both, since the sort node operates only on these. However output 
seems to be more intuitive, so I changed it.


Line 163: 
> builds up the tuple operated on and returned by sort node
Done


Line 171:     // substOrderBy is the mapping from slot refs in the sort node's 
input to slot refs in
> I think the else case may not be needed/possible, but the explanation is th
Yes, thanks for the explanation.


Line 173:     // the tuple operated on and returned by the sort node.
> Can we also move this to the caller? The returned smap can be used to creat
Done


Line 181:       sortTupleExprs.add(origSlotRef);
> let's let the caller make this decision without passing a flag
Done


http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

Line 275:   private PlanFragment createScanFragment(PlanNode node) {
> never mind, the structure looks good as it is.
Ok.


http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 149:       // Add optional sort node to the plan, based on 
clustered/noclustered plan hint.
> Let's keep it concrete: we're adding a sort node
Done


Line 151:       // set up table sink for root fragment
> there's no ClusteringNode. something like 'clusterInsertInput'/ 'createClus
Done


Line 497:    * This will sort the data produced by 'inputFragment' by the 
clustering columns, so
> Mention the ordering of the sort
Done


Line 498:    * that partitions can be written sequentially in the table sink.
> Second sentence is kind of vague. I understand that sorting is just an inst
Done


Line 503:     if (!insertStmt.hasClusteredHint()) return;
> single line (and elsewhere)
Done


Line 508: 
> partitioning?
I took that comment from DistributedPlanner::createInsertFragment() and I 
thought it made sense to remove all constants during the clustering. Changed 
the word.


http://gerrit.cloudera.org:8080/#/c/4745/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

Line 573: # IMPALA-2521: clustered insert into partitioned table adds sort node.
> Let's test these combinations:
Done


Line 575: select * from functional.alltypes
> Let's also test Kudu tables because there we want to sort on the PKs. Let's
I'll try to address this in a subsequent patchset, so we can make progress on 
the rest.


Line 576: ---- PLAN
> also add ---- PLAN section
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to