[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849072#comment-17849072
 ] 

ASF subversion and git services commented on IMPALA-13102:
--

Commit e35f8183cb1ba069ae00ee93e71451eccd505d0a in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e35f8183c ]

IMPALA-13102: Normalize invalid column stats from HMS

Column stats like numDVs, numNulls in HMS could have arbitrary values.
Impala expects them to be non-negative or -1 for unknown. So loading
tables with invalid stats values (<-1) will fail.

This patch adds logic to normalize the stats values. If the value < -1,
use -1 for it and add corresponding warning logs. Also refactor some
redundant codes in ColumnStats.

Tests:
 - Add e2e test

Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a
Reviewed-on: http://gerrit.cloudera.org:8080/21445
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPool

[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-21 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848395#comment-17848395
 ] 

Quanlong Huang commented on IMPALA-13102:
-

Uploaded a patch for review: https://gerrit.cloudera.org/c/21445/

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at org.apache.impa

[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-19 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847742#comment-17847742
 ] 

Quanlong Huang commented on IMPALA-13102:
-

In the Impala dev env, I can set the stats directly in postgresql:
{code:sql}
psql -q -U hiveuser -d ${METASTORE_DB}

HMS_home_quanlong_workspace_Impala_cdp=> select "TBL_ID" from "TBLS" where 
"TBL_NAME" = 'alltypes_bak';
 TBL_ID 

 244931
(1 row)
HMS_home_quanlong_workspace_Impala_cdp=>  select "CS_ID", "DB_NAME", 
"TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where 
"TBL_ID" = 244931;
 CS_ID | DB_NAME |  TABLE_NAME  |   COLUMN_NAME   | NUM_DISTINCTS 
---+-+--+-+---
 68767 | default | alltypes_bak | double_col  |10
 68766 | default | alltypes_bak | id  |  7300
 68765 | default | alltypes_bak | tinyint_col |10
 68764 | default | alltypes_bak | timestamp_col   |  7300
 68763 | default | alltypes_bak | smallint_col|10
 68762 | default | alltypes_bak | date_string_col |   736
 68761 | default | alltypes_bak | string_col  |10
 68760 | default | alltypes_bak | float_col   |10
 68759 | default | alltypes_bak | bigint_col  |10
 68758 | default | alltypes_bak | year| 2
 68757 | default | alltypes_bak | bool_col|  
 68756 | default | alltypes_bak | int_col |10
(12 rows)
HMS_home_quanlong_workspace_Impala_cdp=> UPDATE "TAB_COL_STATS" SET 
"NUM_DISTINCTS" = -100 where "CS_ID" = 68766;
HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", 
"TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "CS_ID" 
= 68766;
 CS_ID | DB_NAME |  TABLE_NAME  | COLUMN_NAME | NUM_DISTINCTS 
---+-+--+-+---
 68766 | default | alltypes_bak | id  |  -100
(1 row)
{code}

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalo