[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: = - Updated the affected tests to include the null counts. - Added unit tests that verify IS [NOT] NULL predicates' cardinality estimation. Perf note: = I reran the compute stats child query with null counts included on the store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22 non-partitioned columns (on which null counts were computed) and ~2.8B rows. This experiment showed around 7-8% perf drop compared to the same child query without null counts for these columns. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Reviewed-on: http://gerrit.cloudera.org:8080/11565 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test 14 files changed, 1,295 insertions(+), 1,304 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 08 Dec 2018 00:21:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3539/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 9: Code-Review+2 Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3538/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 9: Code-Review+1 LGTM -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 07 Dec 2018 00:14:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has uploaded a new patch set (#8) to the change originally created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: Updated the affected tests to include the null counts. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test 14 files changed, 1,295 insertions(+), 1,304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/8 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 9: Tim/Paul: Mind taking another quick look at the updated test and commit message? Thanks. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 06 Dec 2018 21:24:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has uploaded a new patch set (#9) to the change originally created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: = - Updated the affected tests to include the null counts. - Added unit tests that verify IS [NOT] NULL predicates' cardinality estimation. Perf note: = I reran the compute stats child query with null counts included on the store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22 non-partitioned columns (on which null counts were computed) and ~2.8B rows. This experiment showed around 7-8% perf drop compared to the same child query without null counts for these columns. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test 14 files changed, 1,295 insertions(+), 1,304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/9 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: Code-Review-1 (2 comments) Tim, you raised perfectly valid concerns. I'll wait for https://gerrit.cloudera.org/#/c/11920/ to be merged since it adds the necessary unit-test infrastructure. Please find the perf test result in the comment. (Parking it as -1 while the other GVO is running). http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16 PS7, Line 16: Tests: Updated the affected tests to include the null counts. > FWIW, IMPALA-7842 provides a starter set of cardinality tests based on expo Tim, I agree we need a unit test for this. Thanks for pointing it out. I'll wait for IMPALA-7842 to go in, rebase on top of it and add it there since it provides the necessary plumbing. http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: > IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were oth I tried this on the store_sales table from 1TB tpcds/parquet dataset. The table has 22 non-partitioned columns and ~2.8B rows. I ran the child query with and without null count and I noticed 7-8% slowdown. (I warmed up the cluster by running the queries until their runtime stabilized). ~64s without null count / ~70s with null count. I don't see why "refresh" would be affected? (unless I'm missing something) -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 06 Dec 2018 04:03:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: It'd still be good to do a sanity check for performance, just on your dev machine, if possible, but that shouldn't block getting this in. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 06 Dec 2018 01:45:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: > Agreed. Let's get this in, then tackle the NDV=0 as a separate issue. IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were other, more severe, issues back when it was disabled - e.g. codegen was disabled for the whole aggregation or something. I just quickly tried an experiment measuring the time added to the aggregation for each additional aggregate function of this nature and the delta from each added function is very small. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 06 Dec 2018 01:43:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: Code-Review+1 (2 comments) My vote is to get this in, then do three things: 1. Use IMPALA-7842 to add cardinality tests based on this feature. 2. Do some refresh metadata performance runs to check performance impact. 3. Tackle the NDV-does-or-does-not-include-nulls issue. http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16 PS7, Line 16: Tests: Updated the affected tests to include the null counts. > Can we add a couple of tests that verify that cardinality estimates for out FWIW, IMPALA-7842 provides a starter set of cardinality tests based on exposing the pre-Thrift plan tree. We can build on those if that patch goes in before this one. http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: > Was discussing this with Paul offline. We thought that adjusting the NDV be Agreed. Let's get this in, then tackle the NDV=0 as a separate issue. I wonder, do we have any data about the original issue: any performance slowness when adding this additional calculation? If a table has many columns, and we add a null count for each, how much impact is there on refresh metadata performance? -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 05 Dec 2018 21:18:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16 PS7, Line 16: Tests: Updated the affected tests to include the null counts. Can we add a couple of tests that verify that cardinality estimates for output of "IS NULL" and "IS NOT NULL" are correct? It would also be good to do a sanity check of the compute stats performance change on a largish table. I'd expect the difference will be small but we should confirm. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 05 Dec 2018 18:11:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: (1 comment) Any reviews on this one? Planning to get this one in before others keep updating these tests. http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: " IS NULL THEN 1 ELSE NULL END)"); > I thought you were doing it in a separate change? https://gerrit.cloudera.o Was discussing this with Paul offline. We thought that adjusting the NDV before writing the stats to HMS is problematic since it might affect other SQL engines (like Presto/Hive etc). We don't know if they expect the null count to be adjusted already. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 05 Dec 2018 16:32:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1511/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 03 Dec 2018 17:49:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 7: (1 comment) Rebased. Core and exhaustive tests passed. PS7 is ready for review. http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: > If we compute the null count, can we also solve the NDV=0 issue? Before sav I thought you were doing it in a separate change? https://gerrit.cloudera.org/#/c/11528/ I think it is nice to do it as a separate change so that it is easy to revert if something blows up. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 03 Dec 2018 17:16:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has uploaded a new patch set (#7) to the change originally created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: Updated the affected tests to include the null counts. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test 13 files changed, 1,289 insertions(+), 1,304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/7 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 01 Dec 2018 02:22:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS6, Line 251: " IS NULL THEN 1 ELSE NULL END)"); If we compute the null count, can we also solve the NDV=0 issue? Before saving the NDV, add one if the null count is greater than zero. This will ensure that the NDV count here matches the planner's expectation of NDV-including-nulls. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 01 Dec 2018 00:29:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1484/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 30 Nov 2018 23:02:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: Reviewers, please hold off until I confirm that the core/exhaustive builds are green. If not, I'll probably have to patch a few more tests in next PS. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 30 Nov 2018 22:38:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: (2 comments) I fixed the tests based on the failures I noticed in a core run. I'll trigger another test run to see if it comes out green. Meanwhile, I'm also running an exhaustive run so that there are no surprises later. http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG@10 PS5, Line 10: re-enables it since a lot of codegen improvements have happened since > This fix would be helpful for IMPALA-7310: if NDV=0, we need to know if nul Agreed. http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251 PS5, Line 251: " IS NULL THEN 1 ELSE NULL END)"); > If this expression is faster than an if() function (IMPALA-7655), then does Agree with you. I think it's better to do it as a separate change. -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3509/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats
Bharath Vissapragada has uploaded a new patch set (#6) to the change originally created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 ) Change subject: IMPALA-7659: Populate NULL count while computing column stats .. IMPALA-7659: Populate NULL count while computing column stats It was disabled for performance reasons (IMPALA-1003) and this patch re-enables it since a lot of codegen improvements have happened since then. This patch switches the aggregation to use the CASE conditional instead of IF since the former has proper codegen support (IMPALA-7655). Tests: Updated the affected tests to include the null counts. Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test 13 files changed, 1,291 insertions(+), 1,306 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/6 -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac