Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7: Code-Review+1

(2 comments)

My vote is to get this in, then do three things:

1. Use IMPALA-7842 to add cardinality tests based on this feature.
2. Do some refresh metadata performance runs to check performance impact.
3. Tackle the NDV-does-or-does-not-include-nulls issue.

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
> Can we add a couple of tests that verify that cardinality estimates for out
FWIW, IMPALA-7842 provides a starter set of cardinality tests based on exposing 
the pre-Thrift plan tree. We can build on those if that patch goes in before 
this one.


http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:
> Was discussing this with Paul offline. We thought that adjusting the NDV be
Agreed. Let's get this in, then tackle the NDV=0 as a separate issue.

I wonder, do we have any data about the original issue: any performance 
slowness when adding this additional calculation? If a table has many columns, 
and we add a null count for each, how much impact is there on refresh metadata 
performance?



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <piotr.findei...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <piotr.findei...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Dec 2018 21:18:47 +0000
Gerrit-HasComments: Yes

Reply via email to