Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/5914/4/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 257:   // non-deterministic expression.
I don't think we should trust users not to set this to false on 
non-deterministic expressions - they shouldn't be able to potentially crash an 
Impala daemon with a bad query option value.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 60:   private final static List<String> NONDETERMINISTIC_BUILTINS =
https://gerrit.cloudera.org/#/c/5904/ also factored out this list, so when you 
rebase on top of master we should be able to reuse that.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 39:   private static final int SORT_MATERIALIZATION_COST_THRESHOLD = 5;
Comment on how it's derived? If it's somewhat arbitrary its good for readers to 
know.


Line 225:             analyzer.addWarning("Expression '" + orderingExpr.toSql() 
+ " may be " +
I mentioned in an earlier comment - this seems to be unsafe - we shouldn't 
trust the query option too much.


Line 246:         // LiteralExprs don't affect the sort order. TODO: consider 
removing the sort node
Do we have a test that covers the case when all ordering exprs are eliminated?


Line 263:         orderingExprsIter.set(materializedRef);
Is this needed? Is this handled by putting it in the substitution map?


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

Line 152:         if (mergeInfo_.getIsMaterialized().get(i)) {
Also mentioned on the design doc - I think we should only show this at a higher 
explain levels. Any maybe spell out "MATERIALIZED" so that it's less cryptic.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

Line 187:         if (info_.getIsMaterialized().get(i)) {
See previous comment


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 785: |  order by: id ASC MAT
I guess it's useful for these planner tests to show it, so maybe it is good to 
have it at this explain level. Would be interested what others think.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

Line 46: |  order by: int_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST
It's weird that there's no "MAT" here but there are some in the sort below - is 
that a bug? It seems like we should be showing all plain slotrefs as 
materialised.


Line 98: |  order by: int_col ASC
Same here


Line 454: |  order by: tinyint_col + 1 ASC NULLS FIRST, double_col / 2 ASC 
NULLS FIRST, 4 - int_col ASC, 4 * smallint_col ASC
Looks like the cost-based analysis is working here


Line 789: |  order by: double_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST, 
int_col ASC
It seems like we're only materialising things in the bottom-most sort for some 
reason.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/order.test
File testdata/workloads/functional-planner/queries/PlannerTest/order.test:

Line 1245: ====
It would be good to also add some basic planner tests for the materialize_sort 
query option - just as as sanity check that it actually has the intended effect.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test:

Line 1921: # Tests 'order by' on a non-deterministic function, return order 
isn't guaranteed.
It would also be good to test the materialize_sort option for analytics


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/sort.test
File testdata/workloads/functional-query/queries/QueryTest/sort.test:

Line 4146: select id from alltypestiny order by random()
It would be good to add a couple of tests for the materialize_sort query option 
to make sure that it produces the correct results.


Line 4150: select id from alltypestiny order by "AAA"
I think this answers my earlier question about test coverage.


http://gerrit.cloudera.org:8080/#/c/5914/4/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

PS4, Line 158: TestSort
TestRandomSort?


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to