Alex Behm has posted comments on this change.

Change subject: IMPALA-2521: Add clustered hint to insert statements

Patch Set 2:

Commit Message:

Line 11: distributed plan, just before the table sink. This has the effect that
-distributed (also adds to the single-node plan)
File fe/src/main/java/org/apache/impala/analysis/

Line 238:    * TODO: We could do something more sophisticated than simply 
copying input
move this TODO into SortInfo.createSortTupleInfo()

Line 253:     Set<SlotRef> sourceSlots = Sets.newHashSet();
these few lines can also be moved into createSortTupleInfo() right?
File fe/src/main/java/org/apache/impala/analysis/

Line 125:    * the ordering exprs refer to the materialized tuples instead of 
the original input.
refer to the new tuple materialized by this sort instead of the original input.

Line 145:    * output by the exec node implementing the sort. Done by 
materializing slot refs in the
shorter: by the sort node

Line 146:    * order-by and result expressions. Those slot refs in the ordering 
and result exprs are
comment about result expressions only makes sense if we pass in the result 
expressions to this function, otherwise it's not really clear where those come 

Line 147:    * substituted with slot refs into the new tuple. This simplifies 
sorting logic for
we also do this for top-n

Line 152:   public void createSortTupleInfo(Set<SlotRef> sourceSlots, boolean 
materializeSlots, StatementBase stmt, Analyzer analyzer) {
I'd suggest removing the 'stmt' parameter and instead returning the 
ExprEubstitutionMap. Then the stmt callers can apply that smap to whatever 
exprs they need to change.

Line 153:     // The tuple descriptor for the sort input. It will contain the 
materialized tuples,
Isn't this the tuple descriptor for the sort output? The secnd sentence is a 
little imprecise because ("it contains") because a tuple descriptor does not 
contain materialized tuples.

Line 163:     // the internal rows of the sort node.
builds up the tuple operated on and returned by sort node

Line 171:       // TODO: I copied this from and would like to 
add an explanatory
I think the else case may not be needed/possible, but the explanation is that 
it's ok to move predicates outside of this query block, but it's not ok to move 
predicates in if there is a limit. Consider this query:

select count(*) from t1 inner join
(select id, name from t2
 where id < 10
 order by limit 10) v
on ( = and =
where = 'test'

In the example above it's ok to generate ' < 10' and put it into the scan 
of t1, but it's not ok to generate ' = 'test'' and move it into the scan 
of t2.

Hope this makes sense.

Line 173:       if (stmt.hasLimit()) {
Can we also move this to the caller? The returned smap can be used to create 
this. The reason is that it doesn't really make sense to add these value 
transfers during plan generation (for the clustered hint use case).

Line 181:     if (materializeSlots) {
let's let the caller make this decision without passing a flag
File fe/src/main/java/org/apache/impala/planner/

Line 149:       // Add optional clustering node to the plan, based on 
clustered/noclustered plan
Let's keep it concrete: we're adding a sort node

Line 497:    * Insert a sort node into the plan, depending on the 
clustered/noclustered plan hint.
Mention the ordering of the sort

Line 498:    * This will re-order the data produced by 'inputFragment' in a way 
that partitions can
Second sentence is kind of vague. I understand that sorting is just an instance 
of a more general clustering concept, but let's stick to what we are actually 
doing here :)

Line 503:     if (!insertStmt.hasClusteredHint()) {
single line (and elsewhere)

Line 508:     // Ignore constants for the sake of partitioning.
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

Line 573: # IMPALA-2521: Test to make sure that clustering adds a sort node 
into the plan.
Let's test these combinations:
1. Partitioned tables
1.1. With clustered hint
1.2. With clustered and noshuffle hint
2. Unpartitioned tables
2.1. With clustered hint
2.2. With clustered and shuffle hint

Line 575: select * from functional.alltypes
Let's also test Kudu tables because there we want to sort on the PKs. Let's add 
those tests to a Kudu-specific .test file

also add ---- PLAN section

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-HasComments: Yes

Reply via email to