[Impala-ASF-CR] Fix formatting of HBaseScanNode explain output.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15749 Change subject: Fix formatting of HBaseScanNode explain output. .. Fix formatting of HBaseScanNode explain output. In the case with more than one hbase predicate the indentation level wasn't correctly formatted in the explain string. Instead of: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: | | d:foo EQUAL '1' | | d:bar EQUAL '2' | | d:baz EQUAL '3' | | predicate: This was produced: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: d:foo EQUAL '1' d:bar EQUAL '2' d:baz EQUAL '3' | | predicate: Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test 2 files changed, 19 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/15749/1 -- To view, visit http://gerrit.cloudera.org:8080/15749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Gerrit-Change-Number: 15749 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9727: Fix HBaseScanNode explain formatting
Hello Quanlong Huang, Thomas Tauber-Marshall, Fang-Yu Rao, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15749 to look at the new patch set (#2). Change subject: IMPALA-9727: Fix HBaseScanNode explain formatting .. IMPALA-9727: Fix HBaseScanNode explain formatting In the case with more than one hbase predicate the indentation level wasn't correctly formatted in the explain string. Instead of: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: | | d:foo EQUAL '1' | | d:bar EQUAL '2' | | d:baz EQUAL '3' | | predicate: This was produced: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: d:foo EQUAL '1' d:bar EQUAL '2' d:baz EQUAL '3' | | predicate: Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test 2 files changed, 18 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/15749/2 -- To view, visit http://gerrit.cloudera.org:8080/15749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Gerrit-Change-Number: 15749 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9727: Fix HBaseScanNode explain formatting
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/15749 ) Change subject: IMPALA-9727: Fix HBaseScanNode explain formatting .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/15749/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15749/1//COMMIT_MSG@7 PS1, Line 7: Fix formatting of HBaseScanNode explain output. > Since this is not an urgent hot-fix, could you create a JIRA for this? Done http://gerrit.cloudera.org:8080/#/c/15749/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: http://gerrit.cloudera.org:8080/#/c/15749/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@591 PS1, Line 591: + "'" + filter.filter_constant + "'" > nit: could you move this to the previous line? Agreed will do, but this was the formatted enforced by clang-format. -- To view, visit http://gerrit.cloudera.org:8080/15749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Gerrit-Change-Number: 15749 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 05 May 2020 22:40:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT in Set operations
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16350 ) Change subject: IMPALA-10099: Push down DISTINCT in Set operations .. Patch Set 4: > Patch Set 3: > > (1 comment) Good point. The functional tests were given an Analysis Error for the pushdown in those cases, though added planner tests for it as well. -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 30 Aug 2020 06:37:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT in Set operations
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16350 to look at the new patch set (#4). Change subject: IMPALA-10099: Push down DISTINCT in Set operations .. IMPALA-10099: Push down DISTINCT in Set operations INTERSECT/EXCEPT are not duplicate preserving operations. The distinct aggregations can happen in each operand, the leftmost operand only, or after all the operands in a separate aggregation step. Except for a couple special cases we would use the last strategy most often. This change pushes the distinct aggregation down to the leftmost operand in cases where there are no analytic functions, or when a distinct or grouping operation already eliminates duplicates. In general DISTINCT placement such as in this case should be done throughout the entire plan tree in a cost based manner as described in IMPALA-5260 Testing: * TpcdsPlannerTest * PlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 --- M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test 6 files changed, 2,049 insertions(+), 1,806 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16350/4 -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10064: Support constant propagation for eligible range predicates
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16346 ) Change subject: IMPALA-10064: Support constant propagation for eligible range predicates .. Patch Set 9: Code-Review+1 (1 comment) LGTM nice little fix. http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test@461 PS7, Line 461: timestamp_col <= '2010-12-01'; > Done Done -- To view, visit http://gerrit.cloudera.org:8080/16346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Gerrit-Change-Number: 16346 Gerrit-PatchSet: 9 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 31 Aug 2020 21:39:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 12: > Patch Set 12: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6368/ Flaky test? shell.test_shell_interactive.TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt[table_format_and_file_extension: ('textfile', '.txt') | protocol: hs2] (from pytest) E TIMEOUT: Timeout exceeded. E E version: 3.3 E command: /home/ubuntu/Impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell E args: ['/home/ubuntu/Impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala-shell', '--protocol=hs2', '-ilocalhost:21050'] E searcher: E buffer (last 100 chars): ' default> select 2;\r\n^C\r\n[localhost:21050] default> ' E before (last 100 chars): ' default> select 2;\r\n^C\r\n[localhost:21050] default> ' E after: E match: None E match_index: None E exitstatus: None E flag_eof: False E pid: 12993 E child_fd: 24 E closed: False E timeout: 30 E delimiter: E logfile: None E logfile_read: None E logfile_send: None E maxread: 2000 E ignorecase: False E searchwindowsize: None E delaybeforesend: 0.05 E delayafterclose: 0.1 E delayafterterminate: 0.1 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 12 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 01 Sep 2020 03:07:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 17: (6 comments) http://gerrit.cloudera.org:8080/#/c/16242/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16242/13//COMMIT_MSG@27 PS13, Line 27: be converted to limits. I think row_number is more commonly used in queries. Maybe a more generic name would help. I'm seen some literature use RANKING functions for row_number, rank, dense_rank, and REPORTING functions for sum,count,avg. Not sure how common that convention is though. http://gerrit.cloudera.org:8080/#/c/16242/13//COMMIT_MSG@68 PS13, Line 68: the Might want to have JIRA for NTILE as well for queries like 95th percentile and above. Though a TopN wouldn't work in that case it would just be a filter pushdown. http://gerrit.cloudera.org:8080/#/c/16242/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16242/16//COMMIT_MSG@28 PS16, Line 28: they c enabled http://gerrit.cloudera.org:8080/#/c/16242/13/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/13/be/src/exec/exec-node.cc@188 PS13, Line 188: PARTITIONED_TOPN Random observation, N is usually the size of the whole set, K is usually some smaller subset. It's interesting we ended up with the nomenclature of TOPN instead of TOPK. Not worth changing it. http://gerrit.cloudera.org:8080/#/c/16242/16/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/16/be/src/exec/topn-node.cc@197 PS16, Line 197: one one one http://gerrit.cloudera.org:8080/#/c/16242/16/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java: http://gerrit.cloudera.org:8080/#/c/16242/16/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@103 PS16, Line 103: logic logical -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 11 Sep 2020 04:07:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/16242/17/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test: http://gerrit.cloudera.org:8080/#/c/16242/17/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test@69 PS17, Line 69: 100 The implicit type cast is missing for the literal. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 11 Sep 2020 16:48:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8291: Show constraints in DESCRIBE FORMATTED
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16428 ) Change subject: IMPALA-8291: Show constraints in DESCRIBE FORMATTED .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16428/1/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java File fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java: http://gerrit.cloudera.org:8080/#/c/16428/1/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java@118 PS1, Line 118: if (!isFormatted) { : for (int i = 0; i < fields.length; i++) { : Object value = StringEscapeUtils.escapeJava(fields[i]); : if (value != null) { : tableInfo.append(value); : } : tableInfo.append((i == fields.length - 1) ? LINE_DELIM : FIELD_DELIM); : } > Looks like these are dead codes since isFormatted is always true. Are there Yes it's just copied from Hive where it looks like dead code as well https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java#L582 I'll remove it and add a comment. http://gerrit.cloudera.org:8080/#/c/16428/1/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java@423 PS1, Line 423: .intern() > nit: unneccessary intern Yup, just copied from Hive like that https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java#L187 What do you think is it better to keep it matching Hive or update it? -- To view, visit http://gerrit.cloudera.org:8080/16428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 Gerrit-Change-Number: 16428 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 12 Sep 2020 18:19:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10167: Docs typo for DEFAULT TRANSACTIONAL TYPE
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16447 ) Change subject: IMPALA-10167: Docs typo for DEFAULT_TRANSACTIONAL_TYPE .. Patch Set 1: Small docs typo fix. -- To view, visit http://gerrit.cloudera.org:8080/16447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1d9c985c8c202c8716bb5102a23c1e70cf6b68c Gerrit-Change-Number: 16447 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 14 Sep 2020 16:00:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10167: Docs typo for DEFAULT TRANSACTIONAL TYPE
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16447 Change subject: IMPALA-10167: Docs typo for DEFAULT_TRANSACTIONAL_TYPE .. IMPALA-10167: Docs typo for DEFAULT_TRANSACTIONAL_TYPE Fix for typo referring to DEFAULT_TRANSACTIONAL_TYPE as DEFAULT_TRANSACTION_TYPE. Change-Id: Ib1d9c985c8c202c8716bb5102a23c1e70cf6b68c --- M docs/topics/impala_default_transactional_type.xml 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/16447/1 -- To view, visit http://gerrit.cloudera.org:8080/16447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib1d9c985c8c202c8716bb5102a23c1e70cf6b68c Gerrit-Change-Number: 16447 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@103 PS10, Line 103: // Adjust an ndv of zero to 1 if stats indicate there are null values. > When the numDistinctValues > 0, such adjustment is not performed. I wonder Yes the intent of this patch per earlier notes is to only address the =0 cases as the the general fix is a bit involved at the moment. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 10 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 31 Aug 2020 17:28:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8291: Show constraints in DESCRIBE FORMATTED
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16428 Change subject: IMPALA-8291: Show constraints in DESCRIBE FORMATTED .. IMPALA-8291: Show constraints in DESCRIBE FORMATTED Support for displaying primary and foreign key constraints in describe formatted output. The output attempts to be as close to Hive's implementation as possible. Also includes constraint definitions for the TPC-DS test workload. Testing: * Fresh load of testdata * Metadata query tests comparing the output between Impala and Hive Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 --- M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M testdata/datasets/tpcds/tpcds_schema_template.sql M tests/metadata/test_metadata_query_statements.py 5 files changed, 784 insertions(+), 533 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16428/1 -- To view, visit http://gerrit.cloudera.org:8080/16428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 Gerrit-Change-Number: 16428 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] IMPALA-8291: Show constraints in DESCRIBE FORMATTED
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16428 ) Change subject: IMPALA-8291: Show constraints in DESCRIBE FORMATTED .. Patch Set 1: Hi guys I tried to follow the standard of copying Hive code into the Metadata format utils. Let me know if the thought process has changed around this. @David could you sanity check the tpc-ds constraints? I had to rearrange the order of loading tables to satisfy the constraint definitions. Still doing another pass on tests. -- To view, visit http://gerrit.cloudera.org:8080/16428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 Gerrit-Change-Number: 16428 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 08 Sep 2020 20:50:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#11). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 6 files changed, 795 insertions(+), 784 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/11 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 11: (10 comments) Yeah I guess some of those comments in the tests came from different patches, all cleaned up now. http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@98 PS10, Line 98: adjustNumD > Maybe renamed as getNumDistinctValuesAdjusted(). made it adjustNumDistinctValues, since this is a private method want to avoid the get/set verbs as not to conflate with standard conventions for public methods. http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java File fe/src/main/java/org/apache/impala/catalog/ColumnStats.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@191 PS10, Line 191: > nit: seems like a move of the method in the module. Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@211 PS10, Line 211: verifySelectCol("nullrows", "null_str", > This comment can be removed. Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@182 PS10, Line 182: // NDV(blanks) = 1, add 1 for nulls : // Bug: See IMPALA-7310, IMPALA-8094 : //verifyNdvStmt("SELECT blanks FROM functional.nullrows", 2); > Seems like these lines can be removed. Actually no this is a different issue, just adjusted the references. http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@132 PS10, Line 132: group_str h > This comment is not accurate. Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@136 PS10, Line 136: null_str is al > Seems like the reference to c is not right here. Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@138 PS10, Line 138: i > same here. Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@140 PS10, Line 140: (g > same Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@140 PS10, Line 140: i > same here Done http://gerrit.cloudera.org:8080/#/c/16349/10/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@182 PS10, Line 182: = 1 > Maybe as // NDV(id) = 26, ndv(null_str) = 1, NDV(id)*ndv(null_str) = 26. Done -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 31 Aug 2020 19:11:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16182 Change subject: IMPALA-9974: Join elimination based on referential integrity. .. IMPALA-9974: Join elimination based on referential integrity. Use defined referential dependency constraints during planning. 1. For join ndv calculations involving a single column primary key we now use the table's row count instead of the column's ndv stats which are approximate and incorrect at times. 2. Convert LEFT SEMI JOIN to INNER JOIN if the inner table's primary key is used as the join key. 3. Convert LEFT OUTER JOIN to INNER JOIN if the join key uses a fk/pk relationship, the FK is not nullable. 4. Prune INNER JOINS on fk/pk conditions where the join doesn't affect the output cardinality and there are no predicates or non-key columns in use on the inner side. 5. For a confirmed fk/pk join we include a IS NOT NULL predicate on the foreign key columns. Plan optimizations can be disabled with the query option DISABLE_INFORMATIONAL_CONSTRAINTS Testing: - New constraint rewrite planner test - TPC-DS functional tests - TPC-DS Planner test changes Some TPC-DS 30TB Performance Improvements: - Q72 74.3s -> 24.8s - Q82 13.5s -> 2.2s - Q84 19.33s -> 6.07s - Q85 29s -> 6.33s Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java A fe/src/main/java/org/apache/impala/analysis/TableConstraintRewriter.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/SqlConstraints.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/bin/compute-table-stats.sh M testdata/data/child_table.txt M testdata/data/parent_table.txt M testdata/data/parent_table_2.txt M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/constraint-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test M
[Impala-ASF-CR] IMPALA-6671: [WIP] Skip locked tables from topic updates
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: [WIP] Skip locked tables from topic updates .. Patch Set 1: (6 comments) Thanks for working on this Vihang, it will be a life saver. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1294 PS1, Line 1294: true A freeform boolean is hard to read, best to add an inline comment , true /* tryLock */ or make it an Enum. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1316 PS1, Line 1316: topicUpdateTblLockMaxWaitTimeMs I could be misunderstanding the code, but isn't 7200 seconds too long to wait before skipping the update? http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1325 PS1, Line 1325: return; I hope the updateTblPendingVersion() rarely exceeds the number of attempts, but if it does throw an exception here, wouldn't it be better to try something else. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1352 PS1, Line 1352: if (attempt >= Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION) { : throw new CatalogException("Exhausted max number of attempts " : + Table.NUMBER_OF_ATTEMPTS_TO_UPDATE_VERSION : + " to update the pending table version"); : } Wouldn't we want to try something more graceful in this case. How would the end user recover from this exception. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2662 PS1, Line 2662: LOG.info(String : .format( : "%s the pending catalog version of table %s from %s to %s%s.", : (success ? "Bumped up" : "Attempt to bump up"), getFullName(), : expectedPendingVersion, : newCatalogVersion, success ? "" : " failed")); LOG Debug? Isn't this too frequent. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2677 PS1, Line 2677: getAndSet Why set it to -1? I haven't been able to convince myself that setting it -1 wouldn't prevent another caller from setting the catalog version to something older than intended. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 07 Oct 2020 15:31:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16612 Change subject: IMPALA-10262: RPM/DEB Packaging Support .. IMPALA-10262: RPM/DEB Packaging Support cmake -DBUILD_PACKAGES=ON make package Change-Id: I535e24e151371f5f594c7a8a6d2d585d18cb694f --- M CMakeLists.txt M be/CMakeLists.txt M be/src/service/CMakeLists.txt 3 files changed, 87 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16612/1 -- To view, visit http://gerrit.cloudera.org:8080/16612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I535e24e151371f5f594c7a8a6d2d585d18cb694f Gerrit-Change-Number: 16612 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] (WIP) IMPALA-10262: RPM/DEB Packaging Support
Hello Quanlong Huang, Grant Henke, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16612 to look at the new patch set (#2). Change subject: (WIP) IMPALA-10262: RPM/DEB Packaging Support .. (WIP) IMPALA-10262: RPM/DEB Packaging Support cmake -DBUILD_PACKAGES=ON make package Change-Id: I535e24e151371f5f594c7a8a6d2d585d18cb694f --- M CMakeLists.txt M be/CMakeLists.txt M be/src/service/CMakeLists.txt 3 files changed, 87 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16612/2 -- To view, visit http://gerrit.cloudera.org:8080/16612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I535e24e151371f5f594c7a8a6d2d585d18cb694f Gerrit-Change-Number: 16612 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16612 ) Change subject: IMPALA-10262: RPM/DEB Packaging Support .. Patch Set 1: Work in progress, just wanted to see if anyone has any objections otherwise I'll complete the work. Installing to /opt/impala seems reasonable or is /usr better? TODO: Test that the packages are runnable. Test RPM Make version info a dependency in cmake. When disabled make sure the build isn't slowed down due to the extra "install" step that is now available. -- To view, visit http://gerrit.cloudera.org:8080/16612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535e24e151371f5f594c7a8a6d2d585d18cb694f Gerrit-Change-Number: 16612 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Oct 2020 20:16:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#3). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 6 files changed, 766 insertions(+), 766 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/3 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT for SetOperations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16350 to look at the new patch set (#2). Change subject: IMPALA-10099: Push down DISTINCT for SetOperations .. IMPALA-10099: Push down DISTINCT for SetOperations INTERSECT/EXCEPT are not duplicate preserving operations. The distinct aggregations can happen in each operand, the leftmost operand only, or after all the operands in a separate aggregation step. This change pushes the distinct aggregation down to the leftmost operand in cases where there are no analytic functions, or when a distinct or grouping operation already eliminates duplicates. In general DISTINCT placement such as in this case should be done throughout the entire plan tree in a cost based manner as described in IMPALA-5260 Testing: * TpcdsPlannerTest * PlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 --- M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test 6 files changed, 1,805 insertions(+), 1,806 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16350/2 -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16345 to look at the new patch set (#7). Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16345 to look at the new patch set (#6). Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16345 ) Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. Patch Set 6: Too many parallel CR had some things sneak into a git add that shouldn't have made it just yet. -- To view, visit http://gerrit.cloudera.org:8080/16345 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da Gerrit-Change-Number: 16345 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 23:26:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16345 to look at the new patch set (#8). Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A
[Impala-ASF-CR] IMPALA-10064: Support constant propagation for eligible range predicates
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16346 ) Change subject: IMPALA-10064: Support constant propagation for eligible range predicates .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/16346/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16346/7//COMMIT_MSG@7 PS7, Line 7: 10064 > Oops ! Changed it to 10064. Done http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java File fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java: http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@45 PS7, Line 45: ) { > Right..not needed. I had a version earlier using the arguments then forgot Done http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@65 PS7, Line 65: BinaryPredicate.IS_RANGE_PREDICATE.apply(bp); : if (!isRangeOp && !(bp.getOp() == BinaryPredicate.Operator.EQ)) continue; : SlotRef slotRef = bp.getBoundSlot(); : if (slotRef == null || !bp.getChild(1).isConsta > Makes sense to create a static api for this. I followed the com.google.com I was thinking just a simple method like BinaryPredicate::isEquivalence() but this works too. Thanks for the cleanup! http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@164 PS7, Line 164: & > Done Done http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@180 PS7, Line 180: : : > Done Done -- To view, visit http://gerrit.cloudera.org:8080/16346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Gerrit-Change-Number: 16346 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 28 Aug 2020 07:25:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#8). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test 9 files changed, 789 insertions(+), 782 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/8 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 8: This is getting a little uglier now. Turns out pushing the check down to the stats level exposed holes in the logic elsewhere. Nulls are included in the column ndv stats is it is a Boolean type and if the column is a partitioning column with a null value. I'd be happy to keep this change in SlotRef where it's only affecting selectivity calculations or as in this updated patch for the partitioning case bypass the null adjustments. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 28 Aug 2020 07:02:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10110: bloom filter target fpp query option
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16377 ) Change subject: IMPALA-10110: bloom filter target fpp query option .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16377/4/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/16377/4/be/src/service/query-options.h@50 PS4, Line 50: RUNTIME_FILTER_ERR > It's a bit opaque either way, but consistency makes sense. Made the change. Consistently opaque! -- To view, visit http://gerrit.cloudera.org:8080/16377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3 Gerrit-Change-Number: 16377 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 28 Aug 2020 17:50:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 8: (1 comment) > Patch Set 8: > > I wonder if the holes exposed by the current change can be remedied, for the > time being, by suppling a checkNull argument default to false which will > bypass the new logic. We call the new method with checkNull set to true in > SlotRef only. In the long run, we can fix these other issues. > > I am OK with reverting the logic to your previous version too if the chance > to hit the same problem for other expressions is low. Yeah having a check in slotRef to then just call a special method in stats is just not leaking the abstraction even more. I'd rather than just keep the change localized in slotref. http://gerrit.cloudera.org:8080/#/c/16349/8/testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test: http://gerrit.cloudera.org:8080/#/c/16349/8/testdata/workloads/functional-query/queries/QueryTest/compute-stats-date.test@32 PS8, Line 32: 'd','DATE',1,22,4,4,-1,-1 > Is this change right? I thought the column stats themselves were not meant Yeah this was originally why we only had the correction in the slotref so it benefits selectivity but nothing else. Having the correction in the stats class will show up everywhere stats are used. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 29 Aug 2020 00:40:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#9). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 7 files changed, 787 insertions(+), 775 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/9 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16280 ) Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. Patch Set 5: (2 comments) > Patch Set 4: > > Agree with Aman's comments about the planner tests and the changes from the > official versions... Added comments on the official variants there are only two at this point. I have kept the unofficial variants we used in the past along with the official versions just for the sake of catching an potential regressions. Filed IMPALA-10095 for the planner tests. http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test: http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test@33 PS4, Line 33: select sum(sales) > The TPCDS query is written as two queries in one file. We separate it out l Done http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test: http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test@45 PS4, Line 45: where i_color = 'peach' > Same as above, the TPC-DS specification has it written as two semi colon se Done -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 5 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 19 Aug 2020 14:28:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Hello Aman Sinha, Fang-Yu Rao, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16280 to look at the new patch set (#5). Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q10a.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35a.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 18 files changed, 1,250 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/5 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 5 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Hello Aman Sinha, Fang-Yu Rao, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16280 to look at the new patch set (#7). Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q10a.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35a.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 18 files changed, 1,252 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/7 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 7 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#2). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 6 files changed, 766 insertions(+), 766 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/2 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT for SetOperations
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16350 Change subject: IMPALA-10099: Push down DISTINCT for SetOperations .. IMPALA-10099: Push down DISTINCT for SetOperations Testing: * TpcdsPlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 --- M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test 5 files changed, 1,787 insertions(+), 1,788 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16350/1 -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16345 Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test A
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 1: This is a bit of a lazy fix, handling only the case where ndv is 0 but there are null stats. I tried updating the cardinality with null info in all cases created some big plan changes across the board, to do this change justice we'd need to fix cardinality and selectivity calculation to factor out the number of null values in cases where a predicate is null filtering. The IS NOT NULL predicate does this explicitly but we not other predicates. I welcome people's input on prioritizing the general fix, just makes for LOTS of plan diffs to review. If we're okay with just this partial fix I'll make a separate JIRA for it. Also still expecting more test changes, having trouble with loading data with Tez and memory consumption as well as some more Hbase issues. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 21 Aug 2020 06:08:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16338 ) Change subject: IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows .. Patch Set 1: Code-Review+1 Nice catch. -- To view, visit http://gerrit.cloudera.org:8080/16338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc Gerrit-Change-Number: 16338 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 Aug 2020 01:10:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Hello Aman Sinha, Fang-Yu Rao, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16280 to look at the new patch set (#6). Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q10a.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35a.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 18 files changed, 1,250 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/6 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16349 Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 4 files changed, 751 insertions(+), 731 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/1 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT in Set operations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16350 to look at the new patch set (#3). Change subject: IMPALA-10099: Push down DISTINCT in Set operations .. IMPALA-10099: Push down DISTINCT in Set operations INTERSECT/EXCEPT are not duplicate preserving operations. The distinct aggregations can happen in each operand, the leftmost operand only, or after all the operands in a separate aggregation step. Except for a couple special cases we would use the last strategy most often. This change pushes the distinct aggregation down to the leftmost operand in cases where there are no analytic functions, or when a distinct or grouping operation already eliminates duplicates. In general DISTINCT placement such as in this case should be done throughout the entire plan tree in a cost based manner as described in IMPALA-5260 Testing: * TpcdsPlannerTest * PlannerTest * TPC-DS 30TB Perf run for any affected queries - Q14-1 180s -> 150s - Q14-2 109s -> 90s - Q8 no significant change * SetOperation Planner Tests * Analyzer tests * Tpcds Functional Workload Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 --- M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test 6 files changed, 1,806 insertions(+), 1,806 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16350/3 -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#5). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 7 files changed, 780 insertions(+), 774 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/5 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 5 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10046: Support constant propagation for eligible range predicates
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16346 ) Change subject: IMPALA-10046: Support constant propagation for eligible range predicates .. Patch Set 7: (7 comments) Nice addition. http://gerrit.cloudera.org:8080/#/c/16346/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16346/7//COMMIT_MSG@7 PS7, Line 7: 10046 Wrong JIRA? http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java File fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java: http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@45 PS7, Line 45: List conjuncts, BitSet candidates Are these constructor arguments required? Since you keep some state in this class it might be cleaner to initialize an instance just for specific arguments here and then now require arguments for the methods below. http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@65 PS7, Line 65: (bp.getOp() == BinaryPredicate.Operator.LE || : bp.getOp() == BinaryPredicate.Operator.LT || : bp.getOp() == BinaryPredicate.Operator.GE || : bp.getOp() == BinaryPredicate.Operator.GT); *Suggestion* If you want to make this check a static public method in BinaryPredicate it would be nice. You could clean up some code in HdfsScanNode that is similar for MinMax Predicates. http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@164 PS7, Line 164: This can be static if you choose. http://gerrit.cloudera.org:8080/#/c/16346/7/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@180 PS7, Line 180: } catch (Exception e) { : throw new IllegalStateException("Failed analysis on new predicate", e); : } Not sure that this is a very useful exception. You could do: newPred.analyzeNoThrow(analyzer) http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/datasets/functional/schema_constraints.csv@22 PS7, Line 22: table_format:parquet Usually these small test table are text types, any particular reason for making it parquet? http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test@461 PS7, Line 461: timestamp_col <= '2010-12-01'; There is a NormalizeBinaryPredicate rule that should rewrite "const " into " const" but I couldn't tell if it runs before or after propagateConstants() might be good to test that case as well. -- To view, visit http://gerrit.cloudera.org:8080/16346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Gerrit-Change-Number: 16346 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 27 Aug 2020 23:35:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10099: Push down DISTINCT in Set operations
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16350 ) Change subject: IMPALA-10099: Push down DISTINCT in Set operations .. Patch Set 3: Small optimization for the set operation rewrites. -- To view, visit http://gerrit.cloudera.org:8080/16350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5 Gerrit-Change-Number: 16350 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 27 Aug 2020 23:37:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#6). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 7 files changed, 779 insertions(+), 776 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/6 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/16349/5/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java File fe/src/main/java/org/apache/impala/catalog/ColumnStats.java: http://gerrit.cloudera.org:8080/#/c/16349/5/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@210 PS5, Line 210: public long getNumDistinctValues() > Nice! thanks! http://gerrit.cloudera.org:8080/#/c/16349/5/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@214 PS5, Line 214: } > I talked to you directly about this I agree, the logic is messy like this. Will fix this part. More reason to handle the general case and do the adjustment during updates to the stats. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 27 Aug 2020 21:08:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10110: allow setting bloom filter fpp
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16377 ) Change subject: IMPALA-10110: allow setting bloom filter fpp .. Patch Set 4: Code-Review+1 (2 comments) Non critical suggestions. Thanks for the fix! http://gerrit.cloudera.org:8080/#/c/16377/4/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/16377/4/be/src/service/query-options.h@50 PS4, Line 50: RUNTIME_FILTER_FPP I don't feel strongly about this so either way. "FPP" is kind of an opaque technical term. Since this might be user facing and we've got a flag already called "max_filter_error_rate" which is wordier as a query option but incorporating FILTER_ERROR_RATE instead of FPP might make the option more self documenting. http://gerrit.cloudera.org:8080/#/c/16377/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16377/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@146 PS4, Line 146: max_filter_error_rate Would you mind updating the flag doc to mention the query option as an override? https://github.com/apache/impala/blob/03f2b559c31af7fc11165cf3b00876900e234663/be/src/runtime/runtime-filter-bank.cc#L56 -- To view, visit http://gerrit.cloudera.org:8080/16377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3 Gerrit-Change-Number: 16377 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 27 Aug 2020 22:24:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16345 to look at the new patch set (#4). Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#4). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 6 files changed, 766 insertions(+), 766 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/4 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 ) Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. Patch Set 4: (2 comments) > Patch Set 3: > > (1 comment) > > I think this makes sense to me - the only question really would be whether > this should be broader and increase the NDV in all cases where we know there > are nulls. I think fixing this really bad case first is OK though. Yes I tried the general fix but got some bad plans. The extra change given like booleans, we know how many NULL there are in the stats predicates with value checks that definitely filter out NULLs need should: 1. Not include the +1 for null in the selectivity. 2. Remove the null value count from cardinality. That's just a much larger change in terms of plan diffs and corner cases, so wanted to get this worst case "base case" in sooner. http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@96 PS3, Line 96: if (numDistinctValues_ == 0 && desc_.getIsNullable() && desc_.getStats() != null > Just wonder if the new logic here should be moved to ::getNumDistinctValues I was thinking about that but we'd want to check the slot nullability which is awkward to do from with the stats object. I'm going to leave it in place like this for now. http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@97 PS3, Line 97: && (desc_.getStats().hasNulls() || > It might be safer to assume that there are nulls in the absence of null sta Yes it would be rare at the least but should be an easy check. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 02:49:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16345 ) Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/16345/3/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test: http://gerrit.cloudera.org:8080/#/c/16345/3/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test@1 PS3, Line 1: # TPCDS-Q6 > TPCDS Done http://gerrit.cloudera.org:8080/#/c/16345/3/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test: http://gerrit.cloudera.org:8080/#/c/16345/3/testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test@1 PS3, Line 1: > a instead of -1 Done -- To view, visit http://gerrit.cloudera.org:8080/16345 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da Gerrit-Change-Number: 16345 Gerrit-PatchSet: 5 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 00:50:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10095: Include query plan tests for all of TPC-DS
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16345 to look at the new patch set (#5). Change subject: IMPALA-10095: Include query plan tests for all of TPC-DS .. IMPALA-10095: Include query plan tests for all of TPC-DS Added TpcdsPlannerTest to include each TPC-DS query as a separate plan test file. Removed the previous tpcds-all test file. This means when running only PlannerTest no TPC-DS plans are checked, however as part of a full frontend test run the TpcdsPlannerTest will be included. Runs with cardinality and resource checks, as well as using parquet tables to include predicate pushdowns. Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java D testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test A
[Impala-ASF-CR] IMPALA-7310: Partial fix for NDV cardinality with NULLs.
Hello Aman Sinha, Qifan Chen, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16349 to look at the new patch set (#10). Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. .. IMPALA-7310: Partial fix for NDV cardinality with NULLs. This fix just handles the case where a column's cardinality is zero however it's nullable and we have null stats to indicate there are null values, therefore we adjust the cardinality from 0 to 1. The cardinality of zero was especially problematic when calculating cardinalities for multiple predicates with multiplication. The 0 would propagate up the plan tree and result in poor plan choices such as always using broadcast joins where shuffle would've been more optimal. Testing: * 26 Node TPC-DS 30TB run had better plans for Q4 and Q11 - Q4 172s -> 80s - Q11 103s -> 77s * CardinalityTest * TpcdsPlannerTest Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 --- M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test 7 files changed, 787 insertions(+), 775 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16349/10 -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 10 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for blom filter
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for blom filter .. Patch Set 1: (4 comments) Thanks for implementing and testing this out Riza! http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@24 PS1, Line 24: inacurate nit: inaccurate http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@9 PS1, Line 9: This patch remove FpRateTooHigh() check for bloom filter that can : disable filter if the observed false-positive probability (FPP) rate is : higher than FLAGS_max_filter_error_rate. Such filter with high FPP rate : is still worth to evaluate for several reasons: : : 1. Partition filters are probably still worth evaluating even if there :are false positives, because it's cheap and eliminating a partition :is still beneficial. : 2. Runtime filters are dynamically disabled on the scan side if they are :ineffective. An always true filter is also still being evaluated and :not entirely free. : 3. The disabling is fairly unlikely to kick in for partitioned joins :because it's only applied to a small subset of the filter, before the :Or() operation. : 4. FpRateTooHigh() use num_build_rows to approximate actual FPP rate of :resulting filter. This can be inacurate because it does not take :account of duplicate values of the filter key on the build side. : : This patch also remove some tests in test_runtime_filters.py that check : cancellation of filters having high FPP rate. nit: The grammar here is a little hard to parse, might want to run through it again or pass it through a grammar checker? http://gerrit.cloudera.org:8080/#/c/16499/1//COMMIT_MSG@33 PS1, Line 33: little to no performance regression If you have the performance numbers handy it would be good to include them in the commit message to aid any future readers; however it's not critical if rerunning the benchmark is required. http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/16499/1/be/src/exec/partitioned-hash-join-builder.cc@a941 PS1, Line 941: Is the always_true_filter dead code now or is it used elsewhere? -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 03:36:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10185 Use bool stats for selectivity calculations.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16462 to look at the new patch set (#2). Change subject: IMPALA-10185 Use bool stats for selectivity calculations. .. IMPALA-10185 Use bool stats for selectivity calculations. Factor in numTrues and numFalses stats when computing selectivity for boolean columns. Testing: * New test method in ExprCardinalityTest Change-Id: I95c1c7c915bf6bca13fe006c0531c33988187d12 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java 2 files changed, 39 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/16462/2 -- To view, visit http://gerrit.cloudera.org:8080/16462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95c1c7c915bf6bca13fe006c0531c33988187d12 Gerrit-Change-Number: 16462 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian
[Impala-ASF-CR] IMPALA-10185 Use bool stats for selectivity calculations.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16462 ) Change subject: IMPALA-10185 Use bool stats for selectivity calculations. .. Patch Set 2: Just a small change nothing urgent. -- To view, visit http://gerrit.cloudera.org:8080/16462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95c1c7c915bf6bca13fe006c0531c33988187d12 Gerrit-Change-Number: 16462 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 06:23:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5022 part 1: Implement core functions of outer join simplification
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16266 ) Change subject: IMPALA-5022 part 1: Implement core functions of outer join simplification .. Patch Set 22: Code-Review+1 Xianqing, very nice! This is a very helpful patch. -- To view, visit http://gerrit.cloudera.org:8080/16266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e Gerrit-Change-Number: 16266 Gerrit-PatchSet: 22 Gerrit-Owner: Xianqing He Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Xianqing He Gerrit-Comment-Date: Tue, 22 Sep 2020 12:24:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10112: Remove FpRateTooHigh() check for bloom filter
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16499 ) Change subject: IMPALA-10112: Remove FpRateTooHigh() check for bloom filter .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id9f8f40764b4f6664cc81b0da428afea8e3588d4 Gerrit-Change-Number: 16499 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Sep 2020 22:25:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8291: Show constraints in DESCRIBE FORMATTED
Hello Quanlong Huang, Vihang Karajgaonkar, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16428 to look at the new patch set (#2). Change subject: IMPALA-8291: Show constraints in DESCRIBE FORMATTED .. IMPALA-8291: Show constraints in DESCRIBE FORMATTED Support for displaying primary and foreign key constraints in describe formatted output. The output attempts to be as close to Hive's implementation as possible. Also includes constraint definitions for the TPC-DS test workload. Testing: * Fresh load of testdata * Metadata query tests comparing the output between Impala and Hive Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 --- M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M testdata/datasets/tpcds/tpcds_schema_template.sql M tests/metadata/test_metadata_query_statements.py 5 files changed, 788 insertions(+), 533 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16428/2 -- To view, visit http://gerrit.cloudera.org:8080/16428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 Gerrit-Change-Number: 16428 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 17: Code-Review+1 (2 comments) I don't feel as strongly about refactoring TopNNode in the be as other reviewers. This is a very thorough implementation, bravo! http://gerrit.cloudera.org:8080/#/c/16242/17/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/17/be/src/exec/topn-node.cc@541 PS17, Line 541: sort(sorted_heaps.begin(), sorted_heaps.end(), : [](const PartitionHeapMap::iterator& left, : const PartitionHeapMap::iterator& right) { : int64_t left_discarded = left->second->num_tuples_discarded(); : int64_t right_discarded = right->second->num_tuples_discarded(); : if (left_discarded != right_discarded) { : return left_discarded < right_discarded; : } : return left->second->num_tuples_added_since_eviction() < : right->second->num_tuples_added_since_eviction(); : } Nice, very nice. http://gerrit.cloudera.org:8080/#/c/16242/17/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java: http://gerrit.cloudera.org:8080/#/c/16242/17/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@82 PS17, Line 82: public static String RANK = "rank"; : public static String DENSERANK = "dense_rank"; : public static String ROWNUMBER = "row_number"; nit: a bit of an eye soar, would you mind moving the public members to the begining or end of the block. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 17 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 15 Sep 2020 01:57:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7097 Print EC info in the query plan and profile
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16587 ) Change subject: IMPALA-7097 Print EC info in the query plan and profile .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ea378914624a714fde820d290b3b9c43325c6a1 Gerrit-Change-Number: 16587 Gerrit-PatchSet: 7 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shant Hovsepian Gerrit-Comment-Date: Mon, 26 Oct 2020 16:33:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943, IMPALA-4974: INTERSECT and EXCEPT DISTINCT Support.
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#6). Change subject: IMPALA-9943, IMPALA-4974: INTERSECT and EXCEPT DISTINCT Support. .. IMPALA-9943, IMPALA-4974: INTERSECT and EXCEPT DISTINCT Support. INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 29 files changed, 3,725 insertions(+), 790 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/6 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](asf-site) Add link to slack channel on community
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16024 ) Change subject: Add link to slack channel on community .. Patch Set 1: Code-Review+1 > Patch Set 1: > > Could we just use the-asf.slack.com and create an #impala channel? It looks > like that space is already active with other projects like Beam and I think > the space is blessed by ASF infra. The current slack channel has a decent number of people in it. Might not be worth the switch over to the-asf unless there is some kind of official policy? -- To view, visit http://gerrit.cloudera.org:8080/16024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: Ibc05665607cc35a967c01660b6f4890ddd7e6a40 Gerrit-Change-Number: 16024 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 19 Jul 2020 12:04:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16280 Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 17 files changed, 1,249 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/1 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16280 ) Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. Patch Set 1: Adding the missing tpc-ds queries. Easy spot to compare the query against https://github.com/cwida/tpcds-result-reproduction Assuming decimalv2 will be the default soon, so only added templates for those queries. Also there are slight variances from the answer sets due to decimal rounding, I assume this is expected behavior but wouldn't mind a second set of eyes. -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 04 Aug 2020 02:23:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16280 to look at the new patch set (#2). Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 17 files changed, 1,249 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/2 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Hello Aman Sinha, Fang-Yu Rao, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16280 to look at the new patch set (#4). Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. IMPALA-10034: Add remaining TPC-DS queries to workload. Include remaining TPC-DS queries to the testdata workload definition. Q8 and Q38 were using non standard variants, those have been replaced by the official query versions. Q35 is using an official variant. Had to escape a table alias in Q90 as we treat 'AT' as a reserved keyword. Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad --- A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-2.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q28.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q35.test D testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38-rewrite.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q38.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q44.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q49.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q66.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q87.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q90.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q93.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 17 files changed, 1,248 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16280/4 -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5022: Outer join simplification
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16266 ) Change subject: IMPALA-5022: Outer join simplification .. Patch Set 5: (10 comments) Hi Xianqing, thank you so much for this contribution! I'll need to do another pass and go over the tests but here are some initial comments. http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@17 PS5, Line 17: one null rejecting condition on the inner table. Consider adding a query option like DISABLE_OUTER_TO_INNER_REWRITE so disable this optimization if needed as runtime. http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@34 PS5, Line 34: * Ran the full set of verifications in Impala Public Jenkins Please try out TPC-DS Q49 there the LOJ queries in there should be rewritten. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3217 PS5, Line 3217: getWhereClauseConjuncts( Technically this would include having clause conjuncts as well, so might be misleading to name this function getWhereClauseConuncts. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3227 PS5, Line 3227: } As a further optimization you could use getEquivClassesOnTuples() to also check for null filtering conditions that come as a result of a transitive relationship. For example T1 LEFT OUTER JOIN T2 ON (T1.a = T2.a) JOIN T3 ON (T3.b=T2.b) WHERE T3.b > 10; http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3270 PS5, Line 3270: analyzeNoThrow > For some common SQL functions, we probably can directly test their existen Agreed it would be good to have some static expressions that we know won't reject nulls for example. col IS NULL col1 IS DISTINCT FROM col2 for things like IN and COALESCE you would recursively check the children. IF and CASE are trickier so you might want to call the BE or just skip those. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3427 PS5, Line 3427: // Recompute the graph since we may need to add value-transfer edges based on the See later comment in Planner, but it might be better to return this and have the caller recompute the graph. http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@980 PS4, Line 980: his instanceof CompoundPredicate : && ((CompoundPredicate) this).getOp() == CompoundPredicate.Operator.OR You could use Expr.IS_OR_PREDICATE(this) here. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@978 PS5, Line 978: public List getDisjunctiveConjuncts() { There is something off about this interface. You assume the caller the first time this is called has verified that the predicate is an OR. For example is someone called this function with just a plan Expr then it would return the Expr back. You might want to move the IS_OR_PREDICATE call from Analyzer.java#3245 into it's own wrapper function, which then calls this method. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@748 PS5, Line 748: // Transform outer join into inner join whenever possible Might want to use some state in the analyzer to check if any Outer Joins exist in the query and only then call this function then. For example globalstate_.outerJountTupleIds. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@749 PS5, Line 749: analyzer.simplifyOuterJoins(selectStmt.getTableRefs()); It would be if you returned some indicator that the value transfer graph needs to be recomputed. Then recompute the graph here so you can make the time line event accordingly. ctx_.getTimeline().markEvent("Recomputing value transfer graph") Also if the SingleNodePlanner's valueTransferGraphNeedsUpdate_ was set to true you could likely reset it after you recompute the graph. -- To view, visit http://gerrit.cloudera.org:8080/16266 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-10034: Add remaining TPC-DS queries to workload.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16280 ) Change subject: IMPALA-10034: Add remaining TPC-DS queries to workload. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test: http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q23-1.test@33 PS4, Line 33: select sum(sales) > Is there any specific reason why we have two tests for query 23 that are al The TPCDS query is written as two queries in one file. We separate it out like this to make the test diffing easier. The slight difference is just how TPC-DS designed this workload. http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test: http://gerrit.cloudera.org:8080/#/c/16280/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q24-1.test@45 PS4, Line 45: where i_color = 'peach' > Is there any specific reason why we have two tests for query 24 that are al Same as above, the TPC-DS specification has it written as two semi colon separated queries. -- To view, visit http://gerrit.cloudera.org:8080/16280 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad Gerrit-Change-Number: 16280 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 07 Aug 2020 22:25:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9897: GROUPING SETS/CUBE/ROLLUP parsing and analysis
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16112 ) Change subject: IMPALA-9897: GROUPING SETS/CUBE/ROLLUP parsing and analysis .. Patch Set 10: Code-Review+1 (2 comments) LGTM, nice approach of keeping the groupingExprs_ in the select stmt. http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/cup/sql-parser.cup@3076 PS10, Line 3076: // * GROUP BY a, b, c WITH CUBE - non-standard CUBE syntax supported by some systems Add example of GROUP BY GROUPING SETS ? http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/java/org/apache/impala/analysis/GroupByClause.java File fe/src/main/java/org/apache/impala/analysis/GroupByClause.java: http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/java/org/apache/impala/analysis/GroupByClause.java@121 PS10, Line 121: private void addGroupingID(long id, List groupingExprs) throws AnalysisException { Nit but might be cleaner to use the JDK BitSet class especially down the line when filling out the multiagg info. -- To view, visit http://gerrit.cloudera.org:8080/16112 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I178e45de94d736630c97ae1ec4a92423cd74621f Gerrit-Change-Number: 16112 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jun 2020 16:52:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9917: grouping() and grouping id() support
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16140 ) Change subject: IMPALA-9917: grouping() and grouping_id() support .. Patch Set 12: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0b1640d606256c0fe9204d2a21a8f6d06abcdb6 Gerrit-Change-Number: 16140 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Jul 2020 04:37:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8954: Uncorrelated scalar subqueries in the select list
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8954: Uncorrelated scalar subqueries in the select list .. Patch Set 8: (1 comment) > Patch Set 7: Code-Review+1 > > (1 comment) > > LGTM. One suggestion below for testing. http://gerrit.cloudera.org:8080/#/c/16007/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test: http://gerrit.cloudera.org:8080/#/c/16007/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test@9036 PS7, Line 9036: |--90:EXCHANGE [UNPARTITIONED] > This query seems to be a special case where both left and right inputs of t Good point added an end user sample query which illustrates this. -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Jul 2020 13:40:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8954: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Fang-Yu Rao, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#8). Change subject: IMPALA-8954: Uncorrelated scalar subqueries in the select list .. IMPALA-8954: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery * Added test_tpcds_q9 to e2e and planner tests Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 13 files changed, 1,572 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/8 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9917: grouping() and grouping id() support
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16140 ) Change subject: IMPALA-9917: grouping() and grouping_id() support .. Patch Set 7: (3 comments) Grouping Sets are usually useful when saved as temp or logical views, then sliced and diced to get to the right level. Any potential edge case with saving a grouping set query as a logical view? http://gerrit.cloudera.org:8080/#/c/16140/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16140/7//COMMIT_MSG@28 PS7, Line 28: 27 and 36 sho https://github.com/cwida/tpcds-result-reproduction/blob/master/answer_sets_nulls_last/36.ans https://github.com/cwida/tpcds-result-reproduction/blob/master/answer_sets_nulls_last/27.ans http://gerrit.cloudera.org:8080/#/c/16140/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/16140/7/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@394 PS7, Line 394: casingthe nit: casing__space__the http://gerrit.cloudera.org:8080/#/c/16140/7/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java File fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java: http://gerrit.cloudera.org:8080/#/c/16140/7/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@150 PS7, Line 150:* analysis. Mention in the comment that it's used for implementing grouping() as an example? -- To view, visit http://gerrit.cloudera.org:8080/16140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0b1640d606256c0fe9204d2a21a8f6d06abcdb6 Gerrit-Change-Number: 16140 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Jul 2020 04:59:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8954: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Fang-Yu Rao, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#9). Change subject: IMPALA-8954: Uncorrelated scalar subqueries in the select list .. IMPALA-8954: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery * Added test_tpcds_q9 to e2e and planner tests Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 13 files changed, 1,572 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/9 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8954: Uncorrelated scalar subqueries in the select list
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8954: Uncorrelated scalar subqueries in the select list .. Patch Set 8: (11 comments) Addressed some comments. http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1113 PS3, Line 1113: > nit: implemented Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117 PS3, Line 1117: * > Is this comment applicable for this patch ? Looks like this patch is suppor Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1135 PS3, Line 1135: * > nit: spelling 'gaurantee' Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1230 PS3, Line 1230: // rewrite to a LOJ. > If the outer query has 2 tables joined in the FROM clause, then we add this Done http://gerrit.cloudera.org:8080/#/c/16007/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1114 PS5, Line 1114: implemeted > nit: implemented Done http://gerrit.cloudera.org:8080/#/c/16007/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1126 PS5, Line 1126: flatten > nit: flattened Done http://gerrit.cloudera.org:8080/#/c/16007/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1136 PS5, Line 1136: gaurantee > nit: guarantee Done http://gerrit.cloudera.org:8080/#/c/16007/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test: http://gerrit.cloudera.org:8080/#/c/16007/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test@9036 PS7, Line 9036: |--90:EXCHANGE [UNPARTITIONED] > On a related note, we probably *should* be doing a broadcast when the right Ack http://gerrit.cloudera.org:8080/#/c/16007/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/16007/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1044 PS3, Line 1044: select id, 1+(select min(id) from functional.alltypessmall) > Would be useful to also add the Explain plan for some of these tests either Done http://gerrit.cloudera.org:8080/#/c/16007/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1086 PS3, Line 1086: > Not sure what this means - without grouping? Meant that the parent query had aggregation, and the outer and inner aggregations weren't interfering. This is a good case for subquery coalescing / merging down the line. http://gerrit.cloudera.org:8080/#/c/16007/3/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test: http://gerrit.cloudera.org:8080/#/c/16007/3/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test@3 PS3, Line 3: select case when (select count(*) > Cool that this query is now supported. Can we also add the plan for this u Done -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Jul 2020 19:33:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Fang-Yu Rao, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#7). Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. IMPALA-8984: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery * Added test_tpcds_q9 to e2e and planner tests Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 12 files changed, 1,438 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/7 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 7 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#6). Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. IMPALA-8984: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery * Added test_tpcds_q9 to e2e and planner tests Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 12 files changed, 1,438 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/6 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. Patch Set 6: Fangu-Yu, FYI tweaked the reason table schema to be consistent with references to the column from other tables i.e. *_reason_sk -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 6 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Jul 2020 22:14:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9897: GROUPING SETS/CUBE/ROLLUP parsing and analysis
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16112 ) Change subject: IMPALA-9897: GROUPING SETS/CUBE/ROLLUP parsing and analysis .. Patch Set 13: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/java/org/apache/impala/analysis/GroupByClause.java File fe/src/main/java/org/apache/impala/analysis/GroupByClause.java: http://gerrit.cloudera.org:8080/#/c/16112/10/fe/src/main/java/org/apache/impala/analysis/GroupByClause.java@121 PS10, Line 121: private void addGroupingID(long id, List groupingExprs) throws AnalysisException { > It's probably easier to leave as-is, if only because enumerating classes fo Ack -- To view, visit http://gerrit.cloudera.org:8080/16112 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I178e45de94d736630c97ae1ec4a92423cd74621f Gerrit-Change-Number: 16112 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Jul 2020 22:04:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING.
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16052 to look at the new patch set (#5). Change subject: IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. .. IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload * Ran subquery rewrite tests Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q44.test M tests/query_test/test_tpcds_queries.py 12 files changed, 1,192 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/5 -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 5 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING.
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16052 to look at the new patch set (#4). Change subject: IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. .. IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload * Ran subquery rewrite tests Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q44.test M tests/query_test/test_tpcds_queries.py 12 files changed, 1,192 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/4 -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9949: fix SELECT list subqueries with HAVING/LIMIT
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16171 ) Change subject: IMPALA-9949: fix SELECT list subqueries with HAVING/LIMIT .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6034aedac776783bdc8cdb3a2df344e2b3662da6 Gerrit-Change-Number: 16171 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 13 Jul 2020 01:39:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-891: INTERSECT and EXCEPT Support
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16123 Change subject: IMPALA-891: INTERSECT and EXCEPT Support .. IMPALA-891: INTERSECT and EXCEPT Support [WIP] Lots of code cleanup and intersect/except testing. TPC-DS queries with intersect / except pass Union tests pass. TODO: separate methods between UnionStmt & SetOperationStmt TODO: ValuesStmt in intersect / except Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14a.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14b.test A testdata/workloads/tpcds/queries/tpcds-q14a.test A testdata/workloads/tpcds/queries/tpcds-q14b.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_tpcds_queries.py 22 files changed, 2,243 insertions(+), 589 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/1 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5226: handle single subquery in or predicate
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16152 ) Change subject: IMPALA-5226: handle single subquery in or predicate .. Patch Set 6: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/16152/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16152/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@604 PS6, Line 604: We could implement by replacing or augment the nit: having trouble parsing this sentence. http://gerrit.cloudera.org:8080/#/c/16152/6/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16152/6/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@3504 PS6, Line 3504: # Basic IN subquery in disjunct. I'm pretty sure the behavior is correct, but would be nice to have a test where the subquery returns NULL. a = 1 or b in (select NULL), the in predicate should return NULL then the IS NOT NULL predicate that is added will return false. In the or we'll then have a = 1 or false instead of a = 1 or null. That should be correct, I think the NOT IN case is where this breaks but we don't handle that rewrite. http://gerrit.cloudera.org:8080/#/c/16152/6/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q45.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q45.test: http://gerrit.cloudera.org:8080/#/c/16152/6/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q45.test@2 PS6, Line 2: QUERY: TPCDS-Q45 This doesn't look like the qualification query from tpc-ds for q45. The parameters are on page 115 of http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.13.0.pdf https://github.com/cwida/tpcds-result-reproduction/blob/master/query_qualification/45.sql -- To view, visit http://gerrit.cloudera.org:8080/16152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64588992901afd7cd885419a0b7f949b0b174976 Gerrit-Change-Number: 16152 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Jul 2020 22:55:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#4). Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. IMPALA-8984: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. With this change Q9 of TPC-DS is supported, we now load the 'reasons' table as part of the TPC-DS workload for use by Q9. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py 11 files changed, 1,455 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/4 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. Patch Set 4: (9 comments) Some more tests and review comments addressed. Still want to get a good test run out of jenkins. http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@292 PS3, Line 292: "Invariant violated: Only subqueries that are guaranteed to return a " > nit: "guaranteed" Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@937 PS3, Line 937: * supported in the FROM clause, WHERE clause and SELECT list. The rewrite is > Update this comment for SELECT clause. Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1137 PS3, Line 1137: *returned per group so a run time cardinality check must be applied. An exception > nit: duplicate 'primary' Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117 PS4, Line 1117: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117 PS4, Line 1117: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1135 PS4, Line 1135: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1230 PS4, Line 1230: // rewrite to a LOJ. I added a test for this. I know it feels weird but since the slotref for the subquery is marked as materialized and the other join queries get bound by the USING/ON clause, nothing explodes. Since there are scalar subqueries, the only weird situation is if the cardinality of all the joins where equal to 1 then it might get reordered but the results would still be correct. If it were a correlated subquery then we've need to handle things more carefully, but that's for a later commit. http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1044 PS4, Line 1044: select id, 1+(select min(id) from functional.alltypessmall) Added the tpc-ds query, it's a pretty complex plan. http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test: http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test@3 PS4, Line 3: select case when (select count(*) Done -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:38:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16052 to look at the new patch set (#2). Change subject: IMPALA-9784: Non correlated subqueries in HAVING. .. IMPALA-9784: Non correlated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q44.test M tests/query_test/test_tpcds_queries.py 11 files changed, 1,123 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/2 -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16052 ) Change subject: IMPALA-9784: Non correlated subqueries in HAVING. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@466 PS2, Line 466: // TODO: IMPALA-5100 to cover all cases, we do let through runtime scalars with I relaxed some of these rules to let through subqueries such as (select count(a) from t group by b where b=1). Referenced the jira to enhance the scalar subquery planner checks to handle more expression evaluation but for now thought the tradeoff was better to let these queries through wrapped in a CardinalityCheckNode. There are case where two different runtime scalar subqueries in nested query blocks could run and have runtime errors that interfere with each other since we don't have independent execution, but I checked and many other database (hive, vertica, vectorwise) have this kind of behavior. It feels like a worthwhile trade off to allow more queries to run where some might have a runtime error in an off chance when we'd just otherwise not let the query run at all. Also it's needed to support Q44 from TPC-DS. -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:52:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16073/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/16073/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@285 PS3, Line 285: (inType.isVarchar() || inType.getPrimitiveType() == PrimitiveType.STRING)) { > I think that would be a separate case in SortInfo.getMaterializedSortExprs( It is a weird pattern, I'd expect lpad or substr to be used more often, but I did some query searching and using casts to truncate like this does happen, probably some hold over from older SQL behavior. Not critical to address. -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 03:25:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9866: Query plan debug page stuck in fetch and render loop.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16092 ) Change subject: IMPALA-9866: Query plan debug page stuck in fetch and render loop. .. Patch Set 1: (2 comments) Added a field to indicate query completion. http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl File www/query_plan.tmpl: http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl@140 PS1, Line 140: var summary = json["summary"]; > Not sure about this bit. Isn't this set once the query starts running, not Yes in my testing I was using summary being filled to detect query completion, turns it the timing of fetching the response made it seem like this was a good indicator, but the summary is always. Instead added an attribute to the json response to indicate if the query was still in flight. http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl@229 PS1, Line 229: var intervalId = setInterval(refresh, 1000); > Probably refreshing every 1s is overkill anyway, since the status report in I upped it to 2s only because higher values made the animation of the plan metrics and color flow very choppy. -- To view, visit http://gerrit.cloudera.org:8080/16092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 Gerrit-Change-Number: 16092 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 19 Jun 2020 21:59:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9866: Query plan debug page stuck in fetch and render loop.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16092 ) Change subject: IMPALA-9866: Query plan debug page stuck in fetch and render loop. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl File www/query_plan.tmpl: http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl@140 PS1, Line 140: var summary = json["summary"]; > Not sure about this bit. Isn't this set once the query starts running, not Done http://gerrit.cloudera.org:8080/#/c/16092/1/www/query_plan.tmpl@229 PS1, Line 229: var intervalId = setInterval(refresh, 1000); > I upped it to 2s only because higher values made the animation of the plan Done -- To view, visit http://gerrit.cloudera.org:8080/16092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 Gerrit-Change-Number: 16092 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 19 Jun 2020 22:00:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9866: Query plan debug page stuck in fetch and render loop.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16092 to look at the new patch set (#2). Change subject: IMPALA-9866: Query plan debug page stuck in fetch and render loop. .. IMPALA-9866: Query plan debug page stuck in fetch and render loop. Once a query is completed we stop fetching and rendering the plan. This speeds interaction with large query plans in the web browser as well as reduces some load on the query coordinator. Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 --- M be/src/service/impala-http-handler.cc M www/query_plan.tmpl 2 files changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16092/2 -- To view, visit http://gerrit.cloudera.org:8080/16092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 Gerrit-Change-Number: 16092 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9866: Query plan debug page stuck in fetch and render loop.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16092 Change subject: IMPALA-9866: Query plan debug page stuck in fetch and render loop. .. IMPALA-9866: Query plan debug page stuck in fetch and render loop. Once a query is completed we stop fetching and rendering the plan. This speeds interaction with large query plans in the web browser as well as reduces some load on the query coordinator. Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 --- M www/query_plan.tmpl 1 file changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16092/1 -- To view, visit http://gerrit.cloudera.org:8080/16092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie1f233c90d5f221813833af2e29be2250936d442 Gerrit-Change-Number: 16092 Gerrit-PatchSet: 1 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING.
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16052 to look at the new patch set (#3). Change subject: IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. .. IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload * Ran subquery rewrite tests Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q44.test M tests/query_test/test_tpcds_queries.py 12 files changed, 1,194 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/3 -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16052 ) Change subject: IMPALA-9784, IMPALA-9905: Uncorrelated subqueries in HAVING. .. Patch Set 3: (4 comments) CR comments addressed, still working on clean jenkins runs. http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@465 PS2, Line 465: // TODO: Remove this when independent subquery evaluation is implemented. > Can we add a simple planner test for this rewrite? With the cardinality che Done http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@466 PS2, Line 466: // TODO: IMPALA-5100 to cover all cases, we do let through runtime scalars with > It might be helpful to file a separate JIRA with an example of this query, IMPALA-1285 already existed to cover this case, however the way the JIRA is worded in such way to cover the case of detecting these situations at plan time. I had linked this JIRA with IMPALA-5100 which is the blanket issue for runtime scalar checks. Adding IMPALA-9905 to cover the case of queries that can now run after this change. http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@469 PS2, Line 469: presence > nit: presence Done http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/16052/2/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1218 PS2, Line 1218: from alltypestiny group by id > nit: remove the functional database name here and below - the test framewor Done -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 28 Jun 2020 18:16:27 + Gerrit-HasComments: Yes