Tim Armstrong has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs ......................................................................
Patch Set 4: (18 comments) http://gerrit.cloudera.org:8080/#/c/5914/4/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 257: // non-deterministic expression. I don't think we should trust users not to set this to false on non-deterministic expressions - they shouldn't be able to potentially crash an Impala daemon with a bad query option value. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 60: private final static List<String> NONDETERMINISTIC_BUILTINS = https://gerrit.cloudera.org/#/c/5904/ also factored out this list, so when you rebase on top of master we should be able to reuse that. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 39: private static final int SORT_MATERIALIZATION_COST_THRESHOLD = 5; Comment on how it's derived? If it's somewhat arbitrary its good for readers to know. Line 225: analyzer.addWarning("Expression '" + orderingExpr.toSql() + " may be " + I mentioned in an earlier comment - this seems to be unsafe - we shouldn't trust the query option too much. Line 246: // LiteralExprs don't affect the sort order. TODO: consider removing the sort node Do we have a test that covers the case when all ordering exprs are eliminated? Line 263: orderingExprsIter.set(materializedRef); Is this needed? Is this handled by putting it in the substitution map? http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: Line 152: if (mergeInfo_.getIsMaterialized().get(i)) { Also mentioned on the design doc - I think we should only show this at a higher explain levels. Any maybe spell out "MATERIALIZED" so that it's less cryptic. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: Line 187: if (info_.getIsMaterialized().get(i)) { See previous comment http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 785: | order by: id ASC MAT I guess it's useful for these planner tests to show it, so maybe it is good to have it at this explain level. Would be interested what others think. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test: Line 46: | order by: int_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST It's weird that there's no "MAT" here but there are some in the sort below - is that a bug? It seems like we should be showing all plain slotrefs as materialised. Line 98: | order by: int_col ASC Same here Line 454: | order by: tinyint_col + 1 ASC NULLS FIRST, double_col / 2 ASC NULLS FIRST, 4 - int_col ASC, 4 * smallint_col ASC Looks like the cost-based analysis is working here Line 789: | order by: double_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST, int_col ASC It seems like we're only materialising things in the bottom-most sort for some reason. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/order.test File testdata/workloads/functional-planner/queries/PlannerTest/order.test: Line 1245: ==== It would be good to also add some basic planner tests for the materialize_sort query option - just as as sanity check that it actually has the intended effect. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test: Line 1921: # Tests 'order by' on a non-deterministic function, return order isn't guaranteed. It would also be good to test the materialize_sort option for analytics http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/sort.test File testdata/workloads/functional-query/queries/QueryTest/sort.test: Line 4146: select id from alltypestiny order by random() It would be good to add a couple of tests for the materialize_sort query option to make sure that it produces the correct results. Line 4150: select id from alltypestiny order by "AAA" I think this answers my earlier question about test coverage. http://gerrit.cloudera.org:8080/#/c/5914/4/tests/query_test/test_sort.py File tests/query_test/test_sort.py: PS4, Line 158: TestSort TestRandomSort? -- To view, visit http://gerrit.cloudera.org:8080/5914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
