[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests:
=

- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
  estimation.

Perf note:
=

I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Reviewed-on: http://gerrit.cloudera.org:8080/11565
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 08 Dec 2018 00:21:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3539/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 9: Code-Review+2

Carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3538/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 9: Code-Review+1

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 07 Dec 2018 00:14:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#8) to the change originally 
created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/8
--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 9:

Tim/Paul: Mind taking another quick look at the updated test and commit 
message? Thanks.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 06 Dec 2018 21:24:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-06 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#9) to the change originally 
created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests:
=

- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
  estimation.

Perf note:
=

I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/9
--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7: Code-Review-1

(2 comments)

Tim, you raised perfectly valid concerns. I'll wait for 
https://gerrit.cloudera.org/#/c/11920/ to be merged since it adds the necessary 
unit-test infrastructure.  Please find the perf test result in the comment. 
(Parking it as -1 while the other GVO is running).

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
> FWIW, IMPALA-7842 provides a starter set of cardinality tests based on expo
Tim, I agree we need a unit test for this. Thanks for pointing it out. I'll 
wait for IMPALA-7842 to go in, rebase on top of it and add it there since it 
provides the necessary plumbing.


http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:
> IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were oth
I tried this on the store_sales table from 1TB tpcds/parquet dataset. The table 
has 22 non-partitioned columns and ~2.8B rows.

I ran the child query with and without null count and I noticed 7-8% slowdown. 
(I warmed up the cluster by running the queries until their runtime 
stabilized). ~64s without null count / ~70s with null count.

I don't see why "refresh" would be affected? (unless I'm missing something)



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 06 Dec 2018 04:03:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7:

It'd still be good to do a sanity check for performance, just on your dev 
machine, if possible, but that shouldn't block getting this in.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 06 Dec 2018 01:45:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:
> Agreed. Let's get this in, then tackle the NDV=0 as a separate issue.
IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were other, 
more severe, issues back when it was disabled - e.g. codegen was disabled for 
the whole aggregation or something.

I just quickly tried an experiment measuring the time added to the aggregation 
for each additional aggregate function of this nature and the delta from each 
added function is very small.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 06 Dec 2018 01:43:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7: Code-Review+1

(2 comments)

My vote is to get this in, then do three things:

1. Use IMPALA-7842 to add cardinality tests based on this feature.
2. Do some refresh metadata performance runs to check performance impact.
3. Tackle the NDV-does-or-does-not-include-nulls issue.

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
> Can we add a couple of tests that verify that cardinality estimates for out
FWIW, IMPALA-7842 provides a starter set of cardinality tests based on exposing 
the pre-Thrift plan tree. We can build on those if that patch goes in before 
this one.


http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:
> Was discussing this with Paul offline. We thought that adjusting the NDV be
Agreed. Let's get this in, then tackle the NDV=0 as a separate issue.

I wonder, do we have any data about the original issue: any performance 
slowness when adding this additional calculation? If a table has many columns, 
and we add a null count for each, how much impact is there on refresh metadata 
performance?



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 05 Dec 2018 21:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
Can we add a couple of tests that verify that cardinality estimates for output 
of "IS NULL" and "IS NOT NULL" are correct?

It would also be good to do a sanity check of the compute stats performance 
change on a largish table. I'd expect the difference will be small but we 
should confirm.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 05 Dec 2018 18:11:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-05 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

(1 comment)

Any reviews on this one? Planning to get this one in before others keep 
updating these tests.

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:   " IS NULL THEN 1 ELSE NULL END)");
> I thought you were doing it in a separate change? https://gerrit.cloudera.o
Was discussing this with Paul offline. We thought that adjusting the NDV before 
writing the stats to HMS is problematic since it might affect other SQL engines 
(like Presto/Hive etc). We don't know if they expect the null count to be 
adjusted already.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 05 Dec 2018 16:32:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1511/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 03 Dec 2018 17:49:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 7:

(1 comment)

Rebased. Core and exhaustive tests passed. PS7 is ready for review.

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:
> If we compute the null count, can we also solve the NDV=0 issue? Before sav
I thought you were doing it in a separate change? 
https://gerrit.cloudera.org/#/c/11528/

I think it is nice to do it as a separate change so that it is easy to revert 
if something blows up.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 03 Dec 2018 17:16:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-12-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#7) to the change originally 
created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
13 files changed, 1,289 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/7
--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 01 Dec 2018 02:22:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:   " IS NULL THEN 1 ELSE NULL END)");
If we compute the null count, can we also solve the NDV=0 issue? Before saving 
the NDV, add one if the null count is greater than zero. This will ensure that 
the NDV count here matches the planner's expectation of NDV-including-nulls.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 01 Dec 2018 00:29:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1484/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 30 Nov 2018 23:02:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

Reviewers, please hold off until I confirm that the core/exhaustive builds are 
green. If not, I'll probably have to patch a few more tests in next PS.


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:38:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

(2 comments)

I fixed the tests based on the failures I noticed in a core run. I'll trigger 
another test run to see if it comes out green. Meanwhile, I'm also running an 
exhaustive run so that there are no surprises later.

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG@10
PS5, Line 10: re-enables it since a lot of codegen improvements have happened 
since
> This fix would be helpful for IMPALA-7310: if NDV=0, we need to know if nul
Agreed.


http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS5, Line 251:   " IS NULL THEN 1 ELSE NULL END)");
> If this expression is faster than an if() function (IMPALA-7655), then does
Agree with you. I think it's better to do it as a separate change.



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3509/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

2018-11-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#6) to the change originally 
created by piotr.findei...@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
..

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
13 files changed, 1,291 insertions(+), 1,306 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/6
--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac