Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11698 )

Change subject: IMPALA-5004: Switch to sorting node for large TopN queries
......................................................................


Patch Set 5:

(8 comments)

Addressed comments and made a few more changes:
* Changed query-options.cc to use ParseMemValue to parse the value of 
topn_bytes_limit since it is expressed in bytes
* Re-organized the tests slightly into three distinct tests: (1) test with the 
default value of topn_bytes_limit, (2) test with a small value of 
topn_bytes_limit, and (3) test with topn_bytes_limit = 0 which disables the 
changes
* A value of 0 or -1 for topn_bytes_limit now disables the changes

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@236
PS4, Line 236: th
> the
Done


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@238
PS4, Line 238:    * average tuple serialized size</code>. 'cardinality' is the 
cardinality of the TopN
> offset is not mentioned so either include it or remove it since the arg des
The "estimated # of rows in memory" refers to the sum of the cardinality and 
offset. I don't explicitly mention the offset, because one can infer from the 
method body that "estimated # of rows in memory" = "cardinality + offset".

If you still think the formula is redundant, I can remove it.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@240
PS4, Line 240:
> Even though this style of commenting is widely used, Impala tries not to us
Done


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@325
PS4, Line 325: rdinality =
> as pointed out in the header for SortInfo, the abstractions are not quite r
Moved the tuple handling into estimateTopNMaterializedSize. I couldn't make it 
static since it needs to reference the tuple descriptors of the current 
SortInfo.

estimateTopNMaterializedSize is called by the SingleNodePlanner and the 
SortNode now.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@326
PS4, Line 326: th.min(root.cardinalit
> this looks correct and consistent with the avgRowSize_ computation for Plan
Done


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@334
PS4, Line 334:  then it is likely we are sor
> if we get this wrong, perhaps due to a mistaken many-to-many join cardinali
Yeah, but that doesn't seem ideal. So I the most recent patch allows settings 
topn_bytes_limit to 0 or -1 which disables the decision entirely.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java@260
PS4, Line 260:       nodeResourceProfile_ = ResourceProfile.noReservation(
> where'd the return go? pls look into why tests did not get bothered by this
I didn't run the full suite of tests, only the topn functional-query tests, 
which don't validate memory estimates by default.


http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test:

http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test@1
PS4, Line 1:
> seems to apply to the second test, not the first, so got confused by this.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11698
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Gerrit-Change-Number: 11698
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:13:17 +0000
Gerrit-HasComments: Yes

Reply via email to