[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849072#comment-17849072 ] ASF subversion and git services commented on IMPALA-13102: -- Commit e35f8183cb1ba069ae00ee93e71451eccd505d0a in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e35f8183c ] IMPALA-13102: Normalize invalid column stats from HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Reviewed-on: http://gerrit.cloudera.org:8080/21445 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPool
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848395#comment-17848395 ] Quanlong Huang commented on IMPALA-13102: - Uploaded a patch for review: https://gerrit.cloudera.org/c/21445/ > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at org.apache.impa
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847742#comment-17847742 ] Quanlong Huang commented on IMPALA-13102: - In the Impala dev env, I can set the stats directly in postgresql: {code:sql} psql -q -U hiveuser -d ${METASTORE_DB} HMS_home_quanlong_workspace_Impala_cdp=> select "TBL_ID" from "TBLS" where "TBL_NAME" = 'alltypes_bak'; TBL_ID 244931 (1 row) HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "TBL_ID" = 244931; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68767 | default | alltypes_bak | double_col |10 68766 | default | alltypes_bak | id | 7300 68765 | default | alltypes_bak | tinyint_col |10 68764 | default | alltypes_bak | timestamp_col | 7300 68763 | default | alltypes_bak | smallint_col|10 68762 | default | alltypes_bak | date_string_col | 736 68761 | default | alltypes_bak | string_col |10 68760 | default | alltypes_bak | float_col |10 68759 | default | alltypes_bak | bigint_col |10 68758 | default | alltypes_bak | year| 2 68757 | default | alltypes_bak | bool_col| 68756 | default | alltypes_bak | int_col |10 (12 rows) HMS_home_quanlong_workspace_Impala_cdp=> UPDATE "TAB_COL_STATS" SET "NUM_DISTINCTS" = -100 where "CS_ID" = 68766; HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "CS_ID" = 68766; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68766 | default | alltypes_bak | id | -100 (1 row) {code} > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalo