Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16242 )

Change subject: IMPALA-9979: part 2: partitioned top-n
......................................................................


Patch Set 30:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@49
PS30, Line 49: The
> nit: 'This'
Done


http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@57
PS30, Line 57: both
             : the partitioning
> nit: ..and non-partitioned case ?
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h@133
PS30, Line 133: partition are returned
> Would be good to add how the returned partitions are ordered among themselv
It seems worth being clear about the invariant, so added it here.


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@53
PS30, Line 53: the the
> nit: repetition 'the'
D'oh


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@59
PS30, Line 59: the the
> nit:  repetition 'the'
D'oh


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@398
PS30, Line 398:     RETURN_IF_CANCELLED(state);
> I was wondering if the inner while loop (line 418) should also have a cance
The inner loop is bounded by the batch size so should be fine (the general rule 
has been to check for cancellation at O(batch size) intervals).


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@559
PS30, Line 559:   // We evict heaps starting with the heaps that were least 
effective at filtering
> The eviction policy is interesting.  Considering that a rank()/rownum() can
Maybe I didn't quite get the point, but we should be able to start filtering as 
soon as we have N elements in a heap for the partition - after that point, each 
new row is either in the top N we've seen so far (in which case we discard a 
previous row) or it's not (in which case we discard the new row).


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@666
PS30, Line 666:       // The key references memory in 'tuple_pool_'. Replace it 
with a rematerialized tuple.
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@279
PS30, Line 279:   public long estimateTopNMaterializedSize(long totalRows) {
> Should this estimate account for any per-partition overhead (additional met
We could probably add 100 bytes or so to account for the size of the heap, but 
I feel like it probably doesn't really affect estimates all that much and adds 
a bit of complexity.

I did realise that the name of this method is a bit misleading, so renamed it 
to reflect that it's just estimating the size of that many rows.


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@646
PS30, Line 646: limit on
> nit: incomplete sentence
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@453
PS30, Line 453:           if (limit.limit > 1) {
> Maybe simplify this if-else to  planNodeLimit = Math.max(limit.limit, 1)
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@461
PS30, Line 461:           if (partitionByExprs.isEmpty()) {
> There was a recent issue with constant partition-by and order-by exprs (htt
I modified this to check for whether they are all literals and added a test. 
There's a bit of a distinction between "constant", which means assumed constant 
within a query and literals that are guaranteed constant, so I used the latter.


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@885
PS30, Line 885:     /// Whether the source predicate was a simple < or <= 
inequality.
> This sounded confusing at first ..I interpreted it to mean true for < (the
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@928
PS30, Line 928: to .
> nit:  incomplete
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@1158
PS30, Line 1158: returns
> nit: 'returns ties'
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java
File fe/src/main/java/org/apache/impala/planner/SelectNode.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java@81
PS30, Line 81:       selectNode.conjuncts_.addAll(conjuncts);
> We should check if the child's conjuncts being added are unique compared to
That's a good point. I don't understand exactly when duplicate conjuncts would 
or wouldn't be created, but it makes sense to handle it here.


http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test:

http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test@55
PS30, Line 55: rnk
> nit: This extra rnk is not needed since doing a select * from the subquery
Done


http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test@755
PS30, Line 755: # Analytic predicate can be migrated into inline view, but not 
through the analytic
> Is this true even if the rank predicate is on the top level subquery, not i
I updated this comment to make it clearer which predicates it was referring to. 
int_col refers to the first_value() function but can't be pushed down.


http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test@1741
PS30, Line 1741: # Other predicates referencing rank() and row_number() can't be
> For the cases where the optimization is not applied maybe the DISTRIBUTEDPL
That's a great point.


http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test:

http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test@76
PS30, Line 76: # row_number() predicate on equality instead of range.
> In the previous patch (IMPALA-10296) equality predicate was disallowed exce
The comment needed updating. The limit from the final TOP-N is not in fact 
pushed down here because it isn't safe.


http://gerrit.cloudera.org:8080/#/c/16242/30/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test@777
PS30, Line 777: # Limit pushdown should not be applied.
> But partitioned top-n is created.  It would be good to clarify if (a) no op
Done



--
To view, visit http://gerrit.cloudera.org:8080/16242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Gerrit-Change-Number: 16242
Gerrit-PatchSet: 30
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Feb 2021 04:11:59 +0000
Gerrit-HasComments: Yes

Reply via email to