Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
......................................................................


Patch Set 17: Code-Review+1

(10 comments)

Done. Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1178
PS14, Line 1178: if (partNumRows > -1) {
               :           if (partitionNumRows_ == -1) partitionNumRows_ = 0;
               :
> ping - this is still an issue. it should be
Done


http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1200
PS14, Line 1200:  table size from those partitions with bad r
> but non-partitioned tables (e.g. getNumClusteringCols == 0), have a single
Let me double check. I recall the check for partitions is not enough for 
non-partitioned table.

Checked and found that hasCorruptTableStats_ is sufficient.

This test was removed.


http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1201
PS14, Line 1201: n
> sorry, I meant shouldn't it check of any partitions have either corrupt or
The missing stats condition is numRows == -1L.


http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1233
PS14, Line 1233:       hasCorruptTableStats_ = true;
               :
               :     return numRows;
               :   }
               :
               :   /**
> why is this needed? doesn't this defeat the purpose of estimating the numbe
Done


http://gerrit.cloudera.org:8080/#/c/16098/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16098/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1187
PS17, Line 1187:     if (partitionsWithCorruptOrMissingStats.size() == 0 && 
numPartitionsWithNumRows_ > 0)
               :       return partitionNumRows_;
> needs curly braces around the body of the if statement
Done


http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py
File tests/metadata/test_compute_stats.py:

http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@174
PS17, Line 174:
> we need to add a test for the mixed case for partitioned tables - where som
Add one partition for the partitioned table test. So we have one partition with 
corrupted stats, and one partition with missing stats.

I am not aware of a method to load a partition with good stats :-).


http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@180
PS17, Line 180: Hive
> entire test name should be all lower case
Done


http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@197
PS17, Line 197:       CREATE TABLE {0}.{1} (
              :         id int COMMENT 'Add a comment',
              :         bool_col boolean,
              :         tinyint_col tinyint,
              :         smallint_col smallint,
              :         int_col int,
              :         bigint_col bigint,
              :         float_col float,
              :         double_col double,
              :         date_string_col string,
              :         string_col string,
              :         timestamp_col timestamp,
              :         year int,
              :         month int )
              :         PARTITIONED BY (decade string)
              :         STORED AS PARQUET;
> I know we discussed this before, but I still don't understand why you can't
Both new tests are positive ones in that each checks for positive valued row 
count. In the case of Impala tables, it is free of the bug out of box. There is 
no need for the fix.


http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@231
PS17, Line 231:          assert("cardinality=[1-9][0-9]*" in 
explain_result.data[i + 2])
> there should be some validation of the output for "partitions: %s/%s rows=%
Done


http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@247
PS17, Line 247:     # Load from a local data file
> most of the code in this test method and the above is the same, it should b
Done



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jul 2020 22:18:50 +0000
Gerrit-HasComments: Yes

Reply via email to