Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11698 )

Change subject: IMPALA-5004: Switch to sorting node for large TopN queries
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@236
PS4, Line 236: is
the


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@238
PS4, Line 238:    * average tuple serialized size</code>.
offset is not mentioned so either include it or remove it since the arg 
descriptions and the one-liner definition make things pretty clear already.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@240
PS4, Line 240: @param
Even though this style of commenting is widely used, Impala tries not to use it 
(yes, there are exceptions in the code base, but its on the rare side). If a 
file already uses them, then pls stick with the style of that file, but in this 
case, lets stick with the default style so remove the '@param'.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@325
PS4, Line 325: estimateTopNMaterializedSize
as pointed out in the header for SortInfo, the abstractions are not quite 
right. however, this call-site looks fairly complicated. compueMemLayout does 
not redo if its already been called, so perhaps simplify this to let SortInfo 
deal with its descriptors and calling computeMemLayout? You can have a static 
method do what estimateTopNMaterializedSize does now so that it can be used in 
SortInfo as well as from the SortNode.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@326
PS4, Line 326: getAvgSerializedSize()
this looks correct and consistent with the avgRowSize_ computation for 
PlanNodes.


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@334
PS4, Line 334: estimatedTopNMaterializedSize
if we get this wrong, perhaps due to a mistaken many-to-many join cardinality 
estimate in some descendant child, what is the easiest way to turn this 
decision off? set topn_bytes_limit to max long?


http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java@260
PS4, Line 260:       nodeResourceProfile_ = ResourceProfile.noReservation(
where'd the return go? pls look into why tests did not get bothered by this... 
the profile could have been overwritten down below, in which case we'd get 
possibly different estimates.


http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test:

http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test@1
PS4, Line 1: triggers sort
seems to apply to the second test, not the first, so got confused by this.



--
To view, visit http://gerrit.cloudera.org:8080/11698
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Gerrit-Change-Number: 11698
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Wed, 24 Oct 2018 17:19:41 +0000
Gerrit-HasComments: Yes

Reply via email to