Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16098 )
Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans ...................................................................... Patch Set 17: Code-Review+1 (10 comments) Done. Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1178 PS14, Line 1178: if (partNumRows > -1) { : if (partitionNumRows_ == -1) partitionNumRows_ = 0; : > ping - this is still an issue. it should be Done http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1200 PS14, Line 1200: table size from those partitions with bad r > but non-partitioned tables (e.g. getNumClusteringCols == 0), have a single Let me double check. I recall the check for partitions is not enough for non-partitioned table. Checked and found that hasCorruptTableStats_ is sufficient. This test was removed. http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1201 PS14, Line 1201: n > sorry, I meant shouldn't it check of any partitions have either corrupt or The missing stats condition is numRows == -1L. http://gerrit.cloudera.org:8080/#/c/16098/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1233 PS14, Line 1233: hasCorruptTableStats_ = true; : : return numRows; : } : : /** > why is this needed? doesn't this defeat the purpose of estimating the numbe Done http://gerrit.cloudera.org:8080/#/c/16098/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16098/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1187 PS17, Line 1187: if (partitionsWithCorruptOrMissingStats.size() == 0 && numPartitionsWithNumRows_ > 0) : return partitionNumRows_; > needs curly braces around the body of the if statement Done http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@174 PS17, Line 174: > we need to add a test for the mixed case for partitioned tables - where som Add one partition for the partitioned table test. So we have one partition with corrupted stats, and one partition with missing stats. I am not aware of a method to load a partition with good stats :-). http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@180 PS17, Line 180: Hive > entire test name should be all lower case Done http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@197 PS17, Line 197: CREATE TABLE {0}.{1} ( : id int COMMENT 'Add a comment', : bool_col boolean, : tinyint_col tinyint, : smallint_col smallint, : int_col int, : bigint_col bigint, : float_col float, : double_col double, : date_string_col string, : string_col string, : timestamp_col timestamp, : year int, : month int ) : PARTITIONED BY (decade string) : STORED AS PARQUET; > I know we discussed this before, but I still don't understand why you can't Both new tests are positive ones in that each checks for positive valued row count. In the case of Impala tables, it is free of the bug out of box. There is no need for the fix. http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@231 PS17, Line 231: assert("cardinality=[1-9][0-9]*" in explain_result.data[i + 2]) > there should be some validation of the output for "partitions: %s/%s rows=% Done http://gerrit.cloudera.org:8080/#/c/16098/17/tests/metadata/test_compute_stats.py@247 PS17, Line 247: # Load from a local data file > most of the code in this test method and the above is the same, it should b Done -- To view, visit http://gerrit.cloudera.org:8080/16098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576 Gerrit-Change-Number: 16098 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 17 Jul 2020 22:18:50 +0000 Gerrit-HasComments: Yes