[ https://issues.apache.org/jira/browse/IMPALA-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fang-Yu Rao resolved IMPALA-11666. ---------------------------------- Resolution: Fixed Resolve this JIRA since the patch has been merged. > Consider revising the warning message when hasCorruptTableStats_ is true for > a table > ------------------------------------------------------------------------------------ > > Key: IMPALA-11666 > URL: https://issues.apache.org/jira/browse/IMPALA-11666 > Project: IMPALA > Issue Type: Task > Components: Frontend > Reporter: Fang-Yu Rao > Assignee: Fang-Yu Rao > Priority: Major > > Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true > when one of the following is true in > [HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]. > # Its '{{{}cardinality_{}}}' less than -1. > # The number of rows in one of its partition is less than -1. > # The number of rows in one of its partition is 0 but the size of the > associated files of this partition is greater than 0. > # The number of rows in the table is 0 but the size of the associated files > of this table is greater than 0. > For such a table, the {{EXPLAIN}} statement for queries involving the table > would contain the message of "{{{}WARNING: The following tables have > potentially corrupt table statistics. Drop and re-compute statistics to > resolve this problem.{}}}" > The warning message may be a bit too scary for an Impala user especially if > we consider the fact that a table without corrupt statistics could indeed > have its '{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend. > Specifically, a table without corrupt statistics but having its > '{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after > starting the Impala cluster. > # Execute on the command line "{{{}beeline -u > "jdbc:hive2://localhost:11050/default"{}}}" to enter beeline. > # Create a transactional table in beeline via "{{{}create table > test_db.test_tbl_01 (id int, name string) stored as orc tblproperties > ('transactional'='true'){}}}". > # Insert a row into the table just created in beeline via "{{{}insert into > table test_db.test_tbl_01 (1, "Alex");{}}}". > # Delete the row just inserted in beeline via "{{{}delete from > test_db.test_tbl_01 where id = 1{}}}". > # In Impala shell, execute "{{compute stats test_db.test_tbl_01}}". > # In Impala shell, execute "{{{}explain select * from > test_db.test_tbl_01{}}}" to verify that the warning message described above > appears in the output. > The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size > is greater than 0. > It may be better that we revise the warning message to something less scary > as shown below. > {code:java} > The number of rows in the following tables or in a partition of them has 0 or > fewer than -1 row but positive total file size. > This does not necessarily imply the existence of corrupt statistics. > In the case of corrupt statistics, drop and re-compute statistics could > resolve this problem. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org