Alex Behm has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs ......................................................................
Patch Set 2: (15 comments) http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 292: // We currently don't have a way to indicate if a UDF is deterministic, so just always > Not sure what you mean by 'two separate members". Maybe Expr.containsNonDet Sorry for the confusion, let me try to clarify: You added a new member Expr.isDeterministic_ which is set to false if the Expr is not deterministic or a UDF. I'm saying we should not mix the UDF and deterministic concepts. One way to do that is to have separate members like containsNonDeterministicBuiltin_ and containsUDF_. It might be even easier to add a Predicate in Expr like IS_EQ_BINARY_PREDICATE and then use Expr.contains() in the appropriate places. I agree that FunctionCallExpr.isNondeterministicBuiltinFn() and Expr.isDeterministic() return consistent values, but that seems confusing to be (they return consistent values for non-builtins because we consider non-builtins to be non-deterministic) http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 228: * exprs that get materialized. > There's already a lot in this review, and I wasn't sure how easy this would Good to know that the sort will work. I agree we should tackle this later, probably easiest to just leave the literals alone for now. http://gerrit.cloudera.org:8080/#/c/6322/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 51: // Original Exprs for any ordering exprs that are materialized. Populated in Subset of ordering exprs that are materialized. Populated ... Line 166: * output by the sort node. Done by materializing slot refs in the order-by and given update comment Line 170: * TODO: We could do something more sophisticated than simply copying input slot refs - TODO is addressed Line 174: List<Expr> resultExprs, Analyzer analyzer) { do these fit into the previous line? Line 180: // substOrderBy is a mapping from slot refs in the sort node's input and ordering Comment is confusing because of the original 'sort node's input' which is not clear. I suggest rephrasing to be more explicit about what the left-hand side of the smap contains: 1. materialized ordering Exprs 2. SlotRefs of non-materialized ordering Exprs 3. SlotRefs of resultExprs after substituting materialized ordering exprs I think point (3) is not yet reflected in the code, but we should do it imo. Example: select concat(a, b, c, d) as x from t order by x The current code will materialize concat(), but also the SlotRefs a, b, c, d. Line 192: List<Expr> nonmaterializedOrderingExprs = Lists.newArrayList(orderingExprs_); nonMaterializedOrderingExprs Line 204: // The ordering exprs are still the original exprs and still point to the old slot Suggest simplifying comment to something like: The ordering exprs are evaluated against the sort tuple, so must reflect our materialization decision above. Line 218: * threshold, or don't have a cost set, by creating SlotRefs for them with shorter: by creating slots for them in 'sortTupleDesc' (we are adding slots to the tuple, not SlotRefs) Line 230: public void createMaterializedOrderExprs( Considering all my comments here and elsewhere I think the simplest interface is: public ExprSubstitutionMap createMaterializedOrderExprs(TupleDescriptor sortTupleDesc, Analyzer analyzer) Line 240: if (origOrderingExpr instanceof LiteralExpr) { Thinking about this again: This behavior complicates the code and function comment above, so I think it would be better to leave literals alone for now. This is only a partial solution to the literals problem anyway. Consider something like: select * from (select 1 as col from t) x order by col We'll ultimately still sort on a literal, so maybe this is not the right place for this check. Even further, I think this solution will not work through inline views at all, e.g.: select * from (select uuid() as col from t) x order by col Let's think about how to address this problem and whether we need to do it now or defer. http://gerrit.cloudera.org:8080/#/c/6322/2/testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test File testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test: Line 1: # expensive order exprs get materialized, cheap ones don't Tests should ideally cover the 3 cases: 1. non-deterministic exprs 2. exprs above/below the cost threshold 3. exprs with an unknown cost (do we have these or are they a "bug"?) Line 102: ---- DISTRIBUTEDPLAN no need for this http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test: Line 19 > I'm not sure how to make that work with PlannerTest, but I added a test to I think you should be able to add a function using FrontendTestBase.addTestFunction() for a planner test. You are right about the UDA, not sure what I was thinking. -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
