Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs ......................................................................
Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/6322/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 39: // All Exprs with cost greater than this will be materialized. Since we don't currently > All ordering exprs with cost... Done Line 40: // have any information about actual function costs, this value guarantees that all > guarantees -> is intended to ensure Done Line 42: // operations unmaterialized, for example 'SlotRef + SlotRef' would have cost 3 and not > remove cost "3" since this might change and it will be hard to tell this co Done Line 167: * result expressions. Those slot refs in the ordering and result exprs are substituted > The "Done by" sentence sounds a little procedural/complicated, can you cond Done Line 170: * TODO: We could do something more sophisticated than simply copying input slot refs - > exprs Done Line 181: // exprs that get materialized to slot refs in the materialized sort tuple. > Can be improved for clarity. Also point 2 is not quite accurate. Done Line 191: TreeNode.collect(resultExprs, Predicates.instanceOf(SlotRef.class), sourceSlots); > // SlotRefs in the result and ordering exprs after substituting the materia Done Line 201: sortTupleExprs.add(origSlotRef); > contains() is expensive (and every iteration might grow the rhs). Done Line 222: * the input to the sort with setMaterializedTupleInfo(), and a mapping from those > Easier to read if you have bullet points for the things we materialize, e.g Done http://gerrit.cloudera.org:8080/#/c/6322/4/testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test File testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test: Line 1: # expensive order exprs get materialized, cheap ones don't > space after # Done Line 118: | > Same as case above? We don't materialize in a merging exchange, the precedi Done Line 139 > Need additional tests to cover the tricky code to avoid redundant expr mate Done http://gerrit.cloudera.org:8080/#/c/6322/4/tests/query_test/test_sort.py File tests/query_test/test_sort.py: Line 158: class TestRandomSort(ImpalaTestSuite): > This will run over all test vectors right? I think it makes sense to only r Because I'm not taking 'vector' as a parameter to the tests, they will only be run once, even under exhaustive. Not sure what the preferred flow here is - how I have it, or taking the vector but restricting it as you suggested, to make the intentions clearer. Line 172: # Include "random()" in the select list to check that its sorted correctly. > it's Done Line 174: "select random() as r from functional.alltypessmall order by r").data, > also add one with an inline view: Done Line 185: """Tests that a window function over 'order by random()' worked as expected.""" > works as expected Done -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
