[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13102.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at 

[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13102.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at 

[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849073#comment-17849073
 ] 

ASF subversion and git services commented on IMPALA-13034:
--

Commit b975165a0acfe37af302dd7c007360633df54917 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b975165a0 ]

IMPALA-13034: Add logs and counters for HTTP profile requests blocking client 
fetches

There are several endpoints in WebUI that can dump a query profile:
/query_profile, /query_profile_encoded, /query_profile_plain_text,
/query_profile_json. The HTTP handler thread goes into
ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the
ClientRequestState. This could block client requests in fetching query
results.

To help identify this issue, this patch adds warning logs when such
profile dumping requests run slow and the query is still in-flight. Also
adds a profile counter, GetInFlightProfileTimeStats, for the summary
stats of this time. Dumping the profiles after the query is archived
(e.g. closed) won't be tracked.

Logs for slow http responses are also added. The thresholds are defined
by two new flags, slow_profile_dump_warning_threshold_ms, and
slow_http_response_warning_threshold_ms.

Note that dumping the profile in-flight won't always block the query,
e.g. if there are no client fetch requests or if the coordinator
fragment is idle waiting for executor fragment instances. So a long time
shown in GetInFlightProfileTimeStats doesn't mean it's hitting the
issue.

To better identify this issue, this patch adds another profile counter,
ClientFetchLockWaitTimer, as the cumulative time client fetch requests
waiting for locks.

Also fixes false positive logs for complaining invalid query handles.
Such logs are added in GetQueryHandle() when the query is not found in
the active query map, but it could still exist in the query log. This
removes the logs in GetQueryHandle() and lets the callers decide whether
to log the error.

Tests:
 - Added e2e test
 - Ran CORE tests

Change-Id: I538ebe914f70f460bc8412770a8f7a1cc8b505dc
Reviewed-on: http://gerrit.cloudera.org:8080/21412
Reviewed-by: Impala Public Jenkins 
Tested-by: Michael Smith 


> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849072#comment-17849072
 ] 

ASF subversion and git services commented on IMPALA-13102:
--

Commit e35f8183cb1ba069ae00ee93e71451eccd505d0a in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e35f8183c ]

IMPALA-13102: Normalize invalid column stats from HMS

Column stats like numDVs, numNulls in HMS could have arbitrary values.
Impala expects them to be non-negative or -1 for unknown. So loading
tables with invalid stats values (<-1) will fail.

This patch adds logic to normalize the stats values. If the value < -1,
use -1 for it and add corresponding warning logs. Also refactor some
redundant codes in ColumnStats.

Tests:
 - Add e2e test

Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a
Reviewed-on: http://gerrit.cloudera.org:8080/21445
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Resolved] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-23 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13034.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-23 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13034.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13083) Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message

2024-05-23 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13083.
---
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message
> --
>
> Key: IMPALA-13083
> URL: https://issues.apache.org/jira/browse/IMPALA-13083
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message is too vague for 
> user/administrator to make necessary adjustment to run query that is rejected 
> by admission-controller.
> {code:java}
> const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION =
> "minimum memory reservation is greater than memory available to the query 
> for buffer "
> "reservations. Memory reservation needed given the current plan: $0. 
> Adjust either "
> "the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the "
> "query to allow the query memory limit to be at least $1. Note that 
> changing the "
> "mem_limit may also change the plan. See the query profile for more 
> information "
> "about the per-node memory requirements.";
> {code}
> There are many config and options that directly and indirectly clamp 
> schedule.per_backend_mem_limit() and schedule.per_backend_mem_to_admit().
> [https://github.com/apache/impala/blob/3b35ddc8ca7b0e540fc16c413a170a25e164462b/be/src/scheduling/schedule-state.cc#L262-L361]
> Ideally, this error message should clearly mention which query option / llama 
> config / backend flag that influence per_backend_mem_limit decision so that 
> user can make directly make adjustment on that config. It should also clearly 
> mention 'Per Host Min Memory Reservation' info string at query profile 
> instead of just 'per-node memory requirements'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13083) Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message

2024-05-23 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13083.
---
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message
> --
>
> Key: IMPALA-13083
> URL: https://issues.apache.org/jira/browse/IMPALA-13083
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message is too vague for 
> user/administrator to make necessary adjustment to run query that is rejected 
> by admission-controller.
> {code:java}
> const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION =
> "minimum memory reservation is greater than memory available to the query 
> for buffer "
> "reservations. Memory reservation needed given the current plan: $0. 
> Adjust either "
> "the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the "
> "query to allow the query memory limit to be at least $1. Note that 
> changing the "
> "mem_limit may also change the plan. See the query profile for more 
> information "
> "about the per-node memory requirements.";
> {code}
> There are many config and options that directly and indirectly clamp 
> schedule.per_backend_mem_limit() and schedule.per_backend_mem_to_admit().
> [https://github.com/apache/impala/blob/3b35ddc8ca7b0e540fc16c413a170a25e164462b/be/src/scheduling/schedule-state.cc#L262-L361]
> Ideally, this error message should clearly mention which query option / llama 
> config / backend flag that influence per_backend_mem_limit decision so that 
> user can make directly make adjustment on that config. It should also clearly 
> mention 'Per Host Min Memory Reservation' info string at query profile 
> instead of just 'per-node memory requirements'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IMPALA-13106) Support larger imported query profile sizes through compression

2024-05-23 Thread Surya Hebbar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848947#comment-17848947
 ] 

Surya Hebbar edited comment on IMPALA-13106 at 5/23/24 1:01 PM:


After thorough research, I have found an efficient zlib port written in client 
side javascript called "pako".

This supports changing compression levels, which is helpful for having control 
over speed of compression.


was (Author: JIRAUSER299620):
After thorough research, I have found an efficient zlip port written in client 
side javascript called "pako".

This supports changing compression levels, which is helpful for having control 
over speed of compression.

> Support larger imported query profile sizes through compression
> ---
>
> Key: IMPALA-13106
> URL: https://issues.apache.org/jira/browse/IMPALA-13106
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Imported query profiles are currently being stored in IndexedDB.
> Although IndexedDB does not have storage limitations like other browser 
> storage APIs, there is a limit on the data that can be stored in one 
> attribute/field.
> This imposes a limitation on the size of query profiles. After some testing, 
> I have found this limit to be around 220 MBs.
> So, it would be helpful to use compression on JSON query profiles, allowing 
> for much larger query profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13106) Support larger imported query profile sizes through compression

2024-05-23 Thread Surya Hebbar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848947#comment-17848947
 ] 

Surya Hebbar commented on IMPALA-13106:
---

After thorough research, I have found an efficient zlip port written in client 
side javascript called "pako".

This supports changing compression levels, which is helpful for having control 
over speed of compression.

> Support larger imported query profile sizes through compression
> ---
>
> Key: IMPALA-13106
> URL: https://issues.apache.org/jira/browse/IMPALA-13106
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Imported query profiles are currently being stored in IndexedDB.
> Although IndexedDB does not have storage limitations like other browser 
> storage APIs, there is a limit on the data that can be stored in one 
> attribute/field.
> This imposes a limitation on the size of query profiles. After some testing, 
> I have found this limit to be around 220 MBs.
> So, it would be helpful to use compression on JSON query profiles, allowing 
> for much larger query profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11512) BINARY support in Iceberg

2024-05-23 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848937#comment-17848937
 ] 

Csaba Ringhofer commented on IMPALA-11512:
--

BINARY columns seem to be working with Iceberg, but testing seems very limited. 
I didn't find any test with partition spec on BINARY columns.

> BINARY support in Iceberg
> -
>
> Key: IMPALA-11512
> URL: https://issues.apache.org/jira/browse/IMPALA-11512
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11735) Handle CREATE_TABLE event when the db is invisible to the impala server user

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848916#comment-17848916
 ] 

ASF subversion and git services commented on IMPALA-11735:
--

Commit 9672312015be959360795a8af0843fdf386b557c in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=967231201 ]

IMPALA-11735: Handle CREATE_TABLE event when the db is invisible to the
impala server user

It's possible that some dbs are invisible to Impala cluster due to
authorization restrictions. However, the CREATE_TABLE events in such
dbs will lead the event-processor into ERROR state. Event processor
should ignore such CREAT_TABLE events when database is not found.

note: This is an incorrect setup, where 'impala' super user is denied
access on the metadata object database but given access to fetch events
from notification log table of metastore.

Testing:
- Manually verified this on local cluster.
- Added automated unit test to verify the same.

Change-Id: I90275bb8c065fc5af61186901ac7e9839a68c43b
Reviewed-on: http://gerrit.cloudera.org:8080/21188
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Handle CREATE_TABLE event when the db is invisible to the impala server user
> 
>
> Key: IMPALA-11735
> URL: https://issues.apache.org/jira/browse/IMPALA-11735
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> It's possible that some dbs are invisible to Impala cluster due to 
> authorization restrictions. However, the CREATE_TABLE events in such dbs will 
> lead the event-processor into ERROR state:
> {noformat}
> E1026 03:02:30.650302 116774 MetastoreEventsProcessor.java:684] Unexpected 
> exception received while processing event
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationException: EventId: 
> 184240416 EventType: CREATE_TABLE Unable to process event
> at 
> org.apache.impala.catalog.events.MetastoreEvents$CreateTableEvent.process(MetastoreEvents.java:735)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:772)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:670)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> E1026 03:02:30.650447 116774 MetastoreEventsProcessor.java:795] Notification 
> event is null
> {noformat}
> It should be handled (e.g. ignored) and reported to the admin (e.g. in logs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13083) Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848917#comment-17848917
 ] 

ASF subversion and git services commented on IMPALA-13083:
--

Commit 98739a84557a209e05694abd79f62f7f7daf8777 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=98739a845 ]

IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION

This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error
message by saying the specific configuration that must be adjusted such
that the query can pass the Admission Control. New fields
'per_backend_mem_to_admit_source' and
'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added
into QuerySchedulePB. These fields explain what limiting factor drives
final numbers at 'per_backend_mem_to_admit' and
'coord_backend_mem_to_admit' respectively. In turn, Admission Control
will use this information to compose a more informative error message
that the user can act upon. The new error message pattern also
explicitly mentions "Per Host Min Memory Reservation" as a place to look
at to investigate memory reservations scheduled for each backend node.

Updated documentation with examples of query rejection by Admission
Control and how to read the error message.

Testing:
- Add BE tests at admission-controller-test.cc
- Adjust and pass affected EE tests

Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66
Reviewed-on: http://gerrit.cloudera.org:8080/21436
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message
> --
>
> Key: IMPALA-13083
> URL: https://issues.apache.org/jira/browse/IMPALA-13083
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message is too vague for 
> user/administrator to make necessary adjustment to run query that is rejected 
> by admission-controller.
> {code:java}
> const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION =
> "minimum memory reservation is greater than memory available to the query 
> for buffer "
> "reservations. Memory reservation needed given the current plan: $0. 
> Adjust either "
> "the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the "
> "query to allow the query memory limit to be at least $1. Note that 
> changing the "
> "mem_limit may also change the plan. See the query profile for more 
> information "
> "about the per-node memory requirements.";
> {code}
> There are many config and options that directly and indirectly clamp 
> schedule.per_backend_mem_limit() and schedule.per_backend_mem_to_admit().
> [https://github.com/apache/impala/blob/3b35ddc8ca7b0e540fc16c413a170a25e164462b/be/src/scheduling/schedule-state.cc#L262-L361]
> Ideally, this error message should clearly mention which query option / llama 
> config / backend flag that influence per_backend_mem_limit decision so that 
> user can make directly make adjustment on that config. It should also clearly 
> mention 'Per Host Min Memory Reservation' info string at query profile 
> instead of just 'per-node memory requirements'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12867) Filter files to OPTIMIZE based on file size

2024-05-23 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs updated IMPALA-12867:
--
Description: 
{{'OPTIMIZE TABLE '}} rewrites all files of the table regardless of 
size and type, even if the table does not contain any small  or delete files.
With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify a 
file size limit to rewrite only small files.
{code:java}
Syntax: OPTIMIZE TABLE  (FILE_SIZE_THRESHOLD_MB=100);{code}
The value of the threshold is the file size in MBs. Data files larger than the 
given limit will only be rewritten if they are referenced from delete deltas.

Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files will 
be rewritten according to the latest schema and partition spec. Therefore the 
intact data files might still have an older schema or partition layout. Use 
{{'OPTIMIZE TABLE table_name'}} to rewrite the entire table according to the 
latest schema and partititon layout.

  was:
{{'OPTIMIZE TABLE '}} rewrites all files of the table regardless of 
size and type, even if the table does not contain any small  or delete files.
With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify a 
file size limit to rewrite only small files.
{code:java}
Syntax: OPTIMIZE TABLE  (FILE_SIZE_THRESHOLD=100);{code}
The value of the threshold is the file size in MBs. Data files larger than the 
given limit will only be rewritten if they are referenced from delete deltas.

Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files will 
be rewritten according to the latest schema and partition spec. Therefore the 
intact data files might still have an older schema or partition layout. Use 
{{'OPTIMIZE TABLE table_name'}} to rewrite the entire table according to the 
latest schema and partititon layout.


> Filter files to OPTIMIZE based on file size
> ---
>
> Key: IMPALA-12867
> URL: https://issues.apache.org/jira/browse/IMPALA-12867
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Noemi Pap-Takacs
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> {{'OPTIMIZE TABLE '}} rewrites all files of the table regardless 
> of size and type, even if the table does not contain any small  or delete 
> files.
> With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify 
> a file size limit to rewrite only small files.
> {code:java}
> Syntax: OPTIMIZE TABLE  (FILE_SIZE_THRESHOLD_MB=100);{code}
> The value of the threshold is the file size in MBs. Data files larger than 
> the given limit will only be rewritten if they are referenced from delete 
> deltas.
> Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files 
> will be rewritten according to the latest schema and partition spec. 
> Therefore the intact data files might still have an older schema or partition 
> layout. Use {{'OPTIMIZE TABLE table_name'}} to rewrite the entire table 
> according to the latest schema and partititon layout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan

2024-05-23 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs updated IMPALA-13074:
--
Description: 
The query plan shows the nodes that take part in the execution, forming a tree 
structure.

It can be displayed in the CLI by issuing the EXPLAIN  command. When the 
actual query is executed, the plan tree can also be viewed in the Impala Web UI 
in a graphic form.

However, the explain string and the graphic plan tree does not match: the top 
node is missing from the Web UI.

This is especially confusing in case of DDL and DML statements, where the Data 
Sink is not displayed. This makes a SELECT * FROM table indistinguishable from 
a CREATE TABLE AS SELECT, since both only displays the SCAN node and omit the 
WRITE_TO_HDFS and SELECT node.

It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.

  was:
The query plan shows the nodes that take part in the execution, forming a tree 
structure.

It can be displayed in the CLI by issuing the EXPLAIN  command. When the 
actual query is executed, the plan tree can also be viewed in the Impala Web UI 
in a graphic form.

However, the explain string and the graphic plan tree does not match: the top 
node is missing from the Web UI.

This is especially confusing in case of DDL and DML statements, where the Data 
Sink is not displayed. This makes a SELECT * FROM table indistinguishable from 
a CREATE TABLE, since both only displays the SCAN node and omit the 
WRITE_TO_HDFS and SELECT node.

It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.


> WRITE TO HDFS node is omitted from Web UI graphic plan
> --
>
> Key: IMPALA-13074
> URL: https://issues.apache.org/jira/browse/IMPALA-13074
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Noemi Pap-Takacs
>Priority: Major
>  Labels: ramp-up
>
> The query plan shows the nodes that take part in the execution, forming a 
> tree structure.
> It can be displayed in the CLI by issuing the EXPLAIN  command. When 
> the actual query is executed, the plan tree can also be viewed in the Impala 
> Web UI in a graphic form.
> However, the explain string and the graphic plan tree does not match: the top 
> node is missing from the Web UI.
> This is especially confusing in case of DDL and DML statements, where the 
> Data Sink is not displayed. This makes a SELECT * FROM table 
> indistinguishable from a CREATE TABLE AS SELECT, since both only displays the 
> SCAN node and omit the WRITE_TO_HDFS and SELECT node.
> It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org