[jira] [Created] (IMPALA-13155) Not all Tuple::DeepCopy() smallify string values

2024-06-12 Thread Jira
Zoltán Borók-Nagy created IMPALA-13155:
--

 Summary: Not all Tuple::DeepCopy() smallify string values
 Key: IMPALA-13155
 URL: https://issues.apache.org/jira/browse/IMPALA-13155
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently "Tuple::DeepCopy(const TupleDescriptor& desc, char** data, int* 
offset, bool convert_ptrs)" does not try to smallify string values, although it 
could safely do that.
 
We use that version of DeepCopy when we BROADCAST data between fragments, so 
smallifying on that path can be beneficial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854474#comment-17854474
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 4681666e9386d87c647d19d6333750c16b6fa0c1 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4681666e9 ]

IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation

isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.

With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.

Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.

Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializ

[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854473#comment-17854473
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/12/24 3:36 PM:
-

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n\) operations for ExprSubstitutionMap.


was (Author: JIRAUSER288956):
Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 

[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12800.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsN

[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

2024-06-12 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13154:
---

 Summary: Some tables are missing in Top-N Tables with Highest 
Memory Requirements
 Key: IMPALA-13154
 URL: https://issues.apache.org/jira/browse/IMPALA-13154
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with 
Highest Memory Requirements". However, not all tables are counted there. E.g. 
after starting catalogd, run a DESCRIBE on a table to trigger metadata loading 
on it. When it's done, the table is not shown in the WebUI.

The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
'type' is 
ThriftObjectType.FULL:
[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]

This used to be the place that all code paths using the table will go to. 
However, we've done bunch of optimizations to not getting the FULL thrift 
object of the table. We should move the code of updating the list of largest 
tables somewhere that all table usages can reach, e.g. after loading the 
metadata of the table, we can update its estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java

2024-06-11 Thread Sai Hemanth Gantasala (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854208#comment-17854208
 ] 

Sai Hemanth Gantasala commented on IMPALA-13153:


Thanks for raising the concern. I'll address this issue soon.

> Unreachable catch clause in MetastoreEvents.java
> 
>
> Key: IMPALA-13153
> URL: https://issues.apache.org/jira/browse/IMPALA-13153
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.5.0
>Reporter: Laszlo Gaal
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> In recent builds of master the frontend build reports the following warning:
> {code}
> 22:38:28 20:38:19 [WARNING] 
> /home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9]
>  unreachable catch clause
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java

2024-06-11 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-13153:


 Summary: Unreachable catch clause in MetastoreEvents.java
 Key: IMPALA-13153
 URL: https://issues.apache.org/jira/browse/IMPALA-13153
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 4.5.0
Reporter: Laszlo Gaal
Assignee: Sai Hemanth Gantasala


In recent builds of master the frontend build reports the following warning:
{code}
22:38:28 20:38:19 [WARNING] 
/home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9]
 unreachable catch clause
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11871.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
> drwxrwx---+ - impala hive 0 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/_impala_insert_staging
> rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
> /warehouse/tablespace/external/hive/t1/test{noformat}
> Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if 
> I add user impala to group supergroup on the catalogd host, this query will 
> succeed past the authorization.
> Additionally, this query does not trip up during analysis when catalog v2 is 
> enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not 
> implemented there yet and always returns null[2].
> [1] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504]
> [2] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298]
> ~~
> Ideally, when Ranger authorization is in place, we should:
> 1) Not check access level during analysis
> 2) Incorporate Ranger ACLs during analysis



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854130#comment-17854130
 ] 

Riza Suminto commented on IMPALA-13152:
---

Filed a patch at: [https://gerrit.cloudera.org/c/21504/] 

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13151.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854089#comment-17854089
 ] 

Riza Suminto commented on IMPALA-13152:
---

Tried your example and I get NaN for BaseProcessingCost.
{noformat}Query: explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1
ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! 
cost-total=0 max-instances=1 cost/inst=0 #cons:#prod=0:0 
total-cost=NaN{noformat}

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854069#comment-17854069
 ] 

Riza Suminto commented on IMPALA-13152:
---

[~stigahuang] is this still happen after IMPALA-13119?

I tried with similar query below and it works:
{noformat}
Query: explain select item_sk, rk from (
select
  ss_item_sk item_sk, ss_sold_time_sk, ss_customer_sk,
  row_number()
  over (partition by ss_item_sk order by ss_sold_time_sk) rk
from store_sales
) b
where rk = 1
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=28.00MB Threads=4   
  |
| Per-Host Resource Estimates: Memory=58MB  
  |
| Analyzed query: SELECT item_sk, rk FROM (SELECT ss_item_sk item_sk,   
  |
| ss_sold_time_sk, ss_customer_sk, row_number() OVER (PARTITION BY ss_item_sk   
  |
| ORDER BY ss_sold_time_sk ASC) rk FROM tpcds_parquet.store_sales) b WHERE rk = 
  |
| CAST(1 AS BIGINT) 
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Instance Resources: mem-estimate=4.20MB mem-reservation=4.00MB 
thread-reservation=1  |
| |  max-parallelism=1 segment-costs=[40262] cpu-comparison-result=6 [max(1 
(self) vs 6 (sum children))]  |
| PLAN-ROOT SINK
  |
| |  output exprs: ss_item_sk, row_number() 
  |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0 cost=35950   |
| | 
  |
| 06:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=201.02KB mem-reservation=0B thread-reservation=0  
  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=4312
  |
| |  in pipelines: 05(GETNEXT)  
  |
| | 
  |
| F01:PLAN FRAGMENT [HASH(ss_item_sk)] hosts=3 instances=3 (adjusted from 384)  
  |
| Per-Instance Resources: mem-estimate=10.16MB mem-reservation=10.00MB 
thread-reservation=1   |
| max-parallelism=3 segment-costs=[146224, 77623] cpu-comparison-result=6 
[max(3 (self) vs 6 (sum children))] |
| 03:SELECT 
  |
| |  predicates: row_number() = CAST(1 AS BIGINT)   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975   
  |
| |  in pipelines: 05(GETNEXT)  
  |
| | 
  |
| 02:ANALYTIC   
  |
| |  functions: row_number()
  |
| |  partition by: ss_item_sk   
  |
| |  order by: ss_sold_time_sk ASC  
  |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW   
  |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0  |
| |  tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975   
  |
| |  in pipelines: 05(GETNEXT

[jira] [Work started] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()

2024-06-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13150 started by Daniel Becker.
--
> Possible buffer overflow in StringVal::CopyFrom()
> -
>
> Key: IMPALA-13150
> URL: https://issues.apache.org/jira/browse/IMPALA-13150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
> {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
> constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
> usually a 32-bit signed integer. The constructor then allocates memory for 
> the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy 
> the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and 
> {{int}} is 32 bits, and the value is truncated, we may copy more bytes that 
> what we have allocated the destination for. See 
> https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()

2024-06-11 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-13150:
---
Summary: Possible buffer overflow in StringVal::CopyFrom()  (was: Possible 
buffer overflow in StringVal)

> Possible buffer overflow in StringVal::CopyFrom()
> -
>
> Key: IMPALA-13150
> URL: https://issues.apache.org/jira/browse/IMPALA-13150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
> {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
> constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
> usually a 32-bit signed integer. The constructor then allocates memory for 
> the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy 
> the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and 
> {{int}} is 32 bits, and the value is truncated, we may copy more bytes that 
> what we have allocated the destination for. See 
> https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924
 ] 

Quanlong Huang commented on IMPALA-13152:
-

Assiging this  to [~rizaon] who knows more about this.

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853919#comment-17853919
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.wr

[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853921#comment-17853921
 ] 

ASF subversion and git services commented on IMPALA-13151:
--

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2407) Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853922#comment-17853922
 ] 

ASF subversion and git services commented on IMPALA-2407:
-

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Nested Types : Remove calls to clock_gettime for a 9x performance improvement 
> on EC2
> 
>
> Key: IMPALA-2407
> URL: https://issues.apache.org/jira/browse/IMPALA-2407
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Mostafa Mokhtar
>Assignee: Jim Apple
>Priority: Critical
>  Labels: ec2, performance, ramp-up
> Fix For: Impala 2.5.0
>
> Attachments: q12Nested.tar.gz
>
>
> Queries against Nested types show that ~90% of the time is spent in 
> clock_gettime. 
> A cheaper accounting method can speed up Nested queries by 8-9x
> {code}
> select
>   count(*)
> from
>   customer.orders_string o,
>   o.lineitems_string l
> where
>   l_shipmode in ('MAIL', 'SHIP')
>   and l_commitdate < l_receiptdate
>   and l_shipdate < l_commitdate
>   and l_receiptdate >= '1994-01-01'
>   and l_receiptdate < '1995-01-01'
> group by
>   l_shipmode
> order by
>   l_shipmode
> {code}
> Schema
> +---+--+-+
>   
>  
> | name  | type | comment |
>   
>  
> +---+--+-+
>   
>  
> | c_custkey | bigint   | |
>   
>  
> | c_name| string   | |
>   
>  
> | c_address | string   | |
>   
>  
> | c_nationkey   | bigint   | |
> | c_phone   | string   | |
> | c_acctbal | double   | |
> | c_mktsegment  | string   | |
> | c_comment | string   | |
> | orders_string | array |   |   o_orderkey:bigint, | |
> |   |   o_orderstatus:string,  | |
> |   |   o_totalprice:double,   | |
> |   |   o_orderdate:string,| |
> |   |   o_orderpriority:string,| |
> |   |   o_clerk:string,| |
> |   |   o_shippriority:bigint, | |
> |   |   o_comment:string,  | |
> |   |   lineitems_string:array |   | l_partkey:bigint,| |
> |   | l_suppkey:bigint,| |
> |   | l_linenumber:bigint, | |
> |   | l_quantity:double,   | |
> |   | l_extendedprice:double,  | |
> |   | l_discount:double,   | |
> |   | l_tax:double,|   

[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853920#comment-17853920
 ] 

ASF subversion and git services commented on IMPALA-10182:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Rows with NULLs filtered out with duplicate columns in subquery select inside 
> UNION ALL
> ---
>
> Key: IMPALA-10182
> URL: https://issues.apache.org/jira/browse/IMPALA-10182
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Aman Sinha
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0.0
>
>
> Bug report from here - 
> https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
> as_adventure.t1 
> ( 
> productsubcategorykey INT, 
> productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1; 
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
>  
> UNION ALL
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the 
> column equivalences get placed in a Select node. It's a bit different in that 
> the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> ++
> | Explain String  
>|
> ++
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6
>|
> | Per-Host Resource Estimates: Memory=576MB   
>|
> | WARNI

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853923#comment-17853923
 ] 

ASF subversion and git services commented on IMPALA-11871:
--

Commit f7e629935b77f412bf74aeebd704af88f03de351 in impala's branch 
refs/heads/master from halim.kim
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f7e629935 ]

IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled

Before this patch, Impala checked whether the Impala service user had
the WRITE access to the target HDFS table/partition(s) during the
analysis of the INSERT and LOAD DATA statements in the legacy catalog
mode. The access levels of the corresponding HDFS table and partitions
were computed by the catalog server solely based on the HDFS permissions
and ACLs when the table and partitions were instantiated.

After this patch, we skip loading HDFS permissions and assume the
Impala service user has the READ_WRITE permission on all the HDFS paths
associated with the target table during query analysis when Ranger is
enabled. The assumption could be removed after Impala's implementation
of FsPermissionChecker could additionally take Ranger's policies of HDFS
into consideration when performing the check.

Testing:
 - Added end-to-end tests to verify Impala's behavior with respect to
   the INSERT and LOAD DATA statements when Ranger is enabled in the
   legacy catalog mode.

Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Reviewed-on: http://gerrit.cloudera.org:8080/20221
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 im

[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-10 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13152:
---

 Summary: IllegalStateException in computing processing cost when 
there are predicates on analytic output columns
 Key: IMPALA-13152
 URL: https://issues.apache.org/jira/browse/IMPALA-13152
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang
Assignee: Riza Suminto


Saw an error in the following query when is on:
{code:sql}
create table tbl (a int, b int, c int);

set COMPUTE_PROCESSING_COST=1;

explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1

ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
{code}
Exception in the logs:
{noformat}
I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] 
java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is 
invalid!
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
at 
org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
at 
org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed

2024-06-10 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843
 ] 

Quanlong Huang commented on IMPALA-13093:
-

It seems adding this to hdfs-site.xml can also fix the issue:
{code:xml}

fs.obs.file.visibility.enable
true
{code}
I'll check whether OBS returns the real block size.
CC [~michaelsmith] [~eyizoha]

> Insert into Huawei OBS table failed
> ---
>
> Key: IMPALA-13093
> URL: https://issues.apache.org/jira/browse/IMPALA-13093
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Insert into a table using Huawei OBS (Object Storage Service) as the storage 
> will failed by the following error:
> {noformat}
> Query: insert into test_obs1 values (1, 'abc')
> ERROR: Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory {noformat}
> Looking into the logs:
> {noformat}
> I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] 
> Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory
> @   0xfc6d44  impala::Status::Status()
> @  0x1c42020  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1c44357  impala::HdfsTableSink::InitOutputPartition()
> @  0x1c4988a  impala::HdfsTableSink::GetOutputPartition()
> @  0x1c46569  impala::HdfsTableSink::Send()
> @  0x14ee25f  impala::FragmentInstanceState::ExecInternal()
> @  0x14efca3  impala::FragmentInstanceState::Exec()
> @  0x148dc4c  impala::QueryState::ExecFInstance()
> @  0x1b3bab9  impala::Thread::SuperviseThread()
> @  0x1b3cdb1  boost::detail::thread_data<>::run()
> @  0x2474a87  thread_proxy
> @ 0x7fe5a562dea5  start_thread
> @ 0x7fe5a25ddb0d  __clone{noformat}
> Note that impalad is started with {{--symbolize_stacktrace=true}} so the 
> stacktrace has symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-06-10 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853842#comment-17853842
 ] 

Riza Suminto commented on IMPALA-13077:
---

Looks like this is a bug in terms of calculating lhsNdv and rhsNdv. In current 
code, if either Ndv or Cardinality of equality expression is unknown (-1), 
getSemiJoinCardinality will skip that expression.

[https://github.com/apache/impala/blob/e7dac008bbafb20e4c7d15d46f2bac9a757f/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L720-L726]
 

If Ndv is unknown, but Cardinality is known, that code should assume 
Cardinality as Ndv instead. I test that hack and confirmed that it lower the 
join cardinality through LOG.
{code:java}
I0610 17:09:25.739796 3972670 JoinNode.java:719] 
774dd75ed2b1fc53:c78b86b2] eqJoinConjuncts_.size=1
I0610 17:09:25.739863 3972670 JoinNode.java:755] 
774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality calculate selectivity 
for (ss_sold_date_sk = min(d_date_sk)) as 5.482456140350877E-4
I0610 17:09:25.739918 3972670 JoinNode.java:760] 
774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality has 
minSelectivity=5.482456140350877E-4
I0610 17:09:25.739933 3972670 JoinNode.java:762] 
774dd75ed2b1fc53:c78b86b2] Changed cardinality from 2880404 to 1579
I0610 17:09:25.739966 3972670 JoinNode.java:866] 
774dd75ed2b1fc53:c78b86b2] stats Join: cardinality=1579{code}

> Equality predicate on partition column and uncorrelated subquery doesn't 
> reduce the cardinality estimate
> 
>
> Key: IMPALA-13077
> URL: https://issues.apache.org/jira/browse/IMPALA-13077
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
> Consider the following query:
> {code:sql}
> select xxx from part_tbl
> where part_key=(select ... from dim_tbl);
> {code}
> Its query plan is a JoinNode with two ScanNodes. When estimating the 
> cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
> partition column and the cardinality of the JoinNode should not be larger 
> than the max row count across partitions.
> The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
> reduction) helps in some cases since there are runtime filters on the 
> partition column. But there are still some cases that we overestimate the 
> cardinality. For instance, 'ss_sold_date_sk' is the only partition key of 
> tpcds.store_sales. The following query
> {code:sql}
> select count(*) from tpcds.store_sales
> where ss_sold_date_sk=(
>   select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
> has query plan:
> {noformat}
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
> | Per-Host Resource Estimates: Memory=243MB   |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 09:AGGREGATE [FINALIZE] |
> | |  output: count:merge(*)   |
> | |  row-size=8B cardinality=1|
> | |   |
> | 08:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 04:AGGREGATE|
> | |  output: count(*) |
> | |  row-size=8B cardinality=1|
> | |   |
> | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
> | |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
> | |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
> | |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
> partitions
> | |   |
> | |--07:EXCHANGE [BROADCAST]  |
> | |  ||
> | |  06:AGGREGATE [FINALIZE]  |
> | |  |  output: min:merge(d_date_sk) 

[jira] [Updated] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13151:
---
Affects Version/s: Impala 4.5.0
   (was: Impala 4.4.0)

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853824#comment-17853824
 ] 

Michael Smith commented on IMPALA-13151:


Oh, more likely that MonotonicStopWatch is less precise because 
https://github.com/apache/impala/blob/4.4.0/be/src/util/stopwatch.h#L159-L163.

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13151 started by Michael Smith.
--
> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853778#comment-17853778
 ] 

Michael Smith commented on IMPALA-13151:


I'm tempted to make that a fuzzy comparison. Maybe the sleep method used for 
debug actions is slightly less precise than the timer.

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13151:
--

Assignee: Michael Smith

> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-10 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13151:
--

 Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on 
ARM
 Key: IMPALA-13151
 URL: https://issues.apache.org/jira/browse/IMPALA-13151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
failing with errors like this:
{noformat}
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
actual: 269834 vs 30{noformat}
So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock

2024-06-10 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated IMPALA-13126:
---
Labels: catalog-2024  (was: )

> ReloadEvent.isOlderEvent() should hold the table read lock
> --
>
> Key: IMPALA-13126
> URL: https://issues.apache.org/jira/browse/IMPALA-13126
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: catalog-2024
>
> Saw an exception like this:
> {noformat}
> E0601 09:11:25.275251   246 MetastoreEventsProcessor.java:990] Unexpected 
> exception received while processing event
> Java exception follows:
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
> at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
> at 
> org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616)
> at 
> org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) {noformat}
> For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check 
> whether the corresponding partition is reloaded after the event. This should 
> be done after holding the table read lock. Otherwise, EventProcessor could 
> hit the error above when there are concurrent DDLs/DMLs modifying the 
> partition list.
> CC [~VenuReddy]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853729#comment-17853729
 ] 

ASF subversion and git services commented on IMPALA-13146:
--

Commit e7dac008bbafb20e4c7d15d46f2bac9a757f in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e7dac008b ]

IMPALA-13146: Download NodeJS from native toolchain

Some test runs have had issues downloading the NodeJS
tarball from the nodejs servers. This changes the
test to download from our native toolchain to make this
more reliable. This means that future upgrades to
NodeJS will need to upload new tarballs to the native
toolchain.

Testing:
 - Ran x86_64/ARM javascript tests

Change-Id: I1def801469cb68633e89b4a0f3c07a771febe599
Reviewed-on: http://gerrit.cloudera.org:8080/21494
Tested-by: Impala Public Jenkins 
Reviewed-by: Surya Hebbar 
Reviewed-by: Wenzhe Zhou 


> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13150) Possible buffer overflow in StringVal

2024-06-10 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13150:
--

 Summary: Possible buffer overflow in StringVal
 Key: IMPALA-13150
 URL: https://issues.apache.org/jira/browse/IMPALA-13150
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker


In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
{{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
usually a 32-bit signed integer. The constructor then allocates memory for the 
length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy the 
buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and {{int}} 
is 32 bits, and the value is truncated, we may copy more bytes that what we 
have allocated the destination for. See 
https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13149:
---

 Summary: Show JVM info in the WebUI
 Key: IMPALA-13149
 URL: https://issues.apache.org/jira/browse/IMPALA-13149
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


It'd be helpful to show the JVM info in the WebUI, e.g. show the output of 
"java -version":
{code:java}
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code}
On nodes just have JRE deployed, we'd like to deploy the same version of JDK to 
perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13148:

Attachment: Selection_123.png
Selection_122.png

> Show the number of in-progress Catalog operations
> -
>
> Key: IMPALA-13148
> URL: https://issues.apache.org/jira/browse/IMPALA-13148
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: Selection_122.png, Selection_123.png
>
>
> In the /operations page of catalogd WebUI, the list of In-progress Catalog 
> Operations are shown. It'd be helpful to also show the number of such 
> operations. Like in the /queries page of coordinator WebUI, it shows 100 
> queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13148:
---

 Summary: Show the number of in-progress Catalog operations
 Key: IMPALA-13148
 URL: https://issues.apache.org/jira/browse/IMPALA-13148
 Project: IMPALA
  Issue Type: Improvement
Reporter: Quanlong Huang
 Attachments: Selection_122.png, Selection_123.png

In the /operations page of catalogd WebUI, the list of In-progress Catalog 
Operations are shown. It'd be helpful to also show the number of such 
operations. Like in the /queries page of coordinator WebUI, it shows 100 
queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-09 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-13143.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-06-09 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853512#comment-17853512
 ] 

Fang-Yu Rao commented on IMPALA-12266:
--

Encountered this failure again at 
[https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/1873/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_convert_table_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]
  in a Jenkins job against [https://gerrit.cloudera.org/c/21160/], which did 
not change Impala's behavior in this area.

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail

[jira] [Commented] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853328#comment-17853328
 ] 

ASF subversion and git services commented on IMPALA-13143:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853329#comment-17853329
 ] 

ASF subversion and git services commented on IMPALA-13134:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
> -
>
> Key: IMPALA-13134
> URL: https://issues.apache.org/jira/browse/IMPALA-13134
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Catalog
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL 
> enabled. If the status of Catalogd is changed from active to standby when 
> CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd 
> does not receive catalog topic updates from statestore. This causes catalogd 
> thread waits indefinitely and DDL query hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-13146:
--

Assignee: Joe McDonnell

> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13147:
--

 Summary: Add support for limiting the concurrency of link jobs
 Key: IMPALA-13147
 URL: https://issues.apache.org/jira/browse/IMPALA-13147
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Link jobs can use a lot of memory due to the amount of debug info. The level of 
concurrency that is useful for compilation can be too high for linking. Running 
a link-heavy command like buildall.sh -skiptests can run out of memory from 
linking all of the backend tests / benchmarks.

It would be useful to be able to limit the number of concurrent link jobs. 
There are two basic approaches:

When using the ninja generator for CMake, ninja supports having job pools with 
limited parallelism. CMake has support for mapping link tasks to their own 
pool. Here is an example:
{noformat}
set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8)
set(CMAKE_JOB_POOL_COMPILE compilation_pool)
set(CMAKE_JOB_POOL_LINK link_pool){noformat}
The makefile generator does not have equivalent functionality, but we could do 
a more limited version where buildall.sh can split the -skiptests into two make 
invocations. The first does all the compilation with full parallelism 
(equivalent to -notests) and then the second make invocation does the backend 
tests / benchmarks with a reduced parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853235#comment-17853235
 ] 

ASF subversion and git services commented on IMPALA-13096:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Cleanup Parser.jj for Calcite planner to only use supported syntax
> --
>
> Key: IMPALA-13096
> URL: https://issues.apache.org/jira/browse/IMPALA-13096
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>    Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13095) Handle UDFs in Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853236#comment-17853236
 ] 

ASF subversion and git services commented on IMPALA-13095:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Handle UDFs in Calcite planner
> --
>
> Key: IMPALA-13095
> URL: https://issues.apache.org/jira/browse/IMPALA-13095
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>    Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12935) Allow function parsing for Impala Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853234#comment-17853234
 ] 

ASF subversion and git services commented on IMPALA-12935:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Allow function parsing for Impala Calcite planner
> -
>
> Key: IMPALA-12935
> URL: https://issues.apache.org/jira/browse/IMPALA-12935
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>
> We need the ability to parse and validate Impala functions using the Calcite 
> planner
> This commit is not attended to work for all functions, or even most 
> functions.  It will work as a base to be reviewed, and at least some 
> functions will work.  More complicated functions will be added in a later 
> commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13146:
--

 Summary: Javascript tests sometimes fail to download NodeJS
 Key: IMPALA-13146
 URL: https://issues.apache.org/jira/browse/IMPALA-13146
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


For automated tests, sometimes the Javascript tests fail to download NodeJS:
{noformat}
01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
01:37:16   % Total% Received % Xferd  Average Speed   TimeTime Time 
 Current
01:37:16  Dload  Upload   Total   SpentLeft 
 Speed
01:37:16 
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
  0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
  0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
...
 30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
read{noformat}
If this keeps happening, we should mirror the NodeJS binary on the 
native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13130.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-13143:


Assignee: Wenzhe Zhou

> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13145:
--

 Summary: Upgrade mold linker to 2.31.0
 Key: IMPALA-13145
 URL: https://issues.apache.org/jira/browse/IMPALA-13145
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Mold 2.31.0 claims performance improvements and a reduction in the memory 
needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] 
and 
[https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342]

We should move to that version as some developers are seeing issues with high 
memory usage for linking.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12967) Testcase fails at test_migrated_table_field_id_resolution due to "Table does not exist"

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853224#comment-17853224
 ] 

Joe McDonnell commented on IMPALA-12967:


There is a separate symptom where this test fails with a Disk I/O error. It is 
probably somewhat related, so we need to decide whether to include that symptom 
here. See IMPALA-13144.

> Testcase fails at test_migrated_table_field_id_resolution due to "Table does 
> not exist"
> ---
>
> Key: IMPALA-12967
> URL: https://issues.apache.org/jira/browse/IMPALA-12967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build
>
> Testcase test_migrated_table_field_id_resolution fails at exhaustive release 
> build with following messages:
> *Regression*
> {code:java}
> query_test.test_iceberg.TestIcebergTable.test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
>  "iceberg_migrated_alter_test_orc", "orc") common/file_utils.py:68: in 
> create_iceberg_table_from_directory file_format)) 
> common/impala_connection.py:215: in execute 
> fetch_profile_after_close=fetch_profile_after_close) 
> beeswax/impala_beeswax.py:191: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:384: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:  E   CAUSED BY: IcebergTableLoadingException: Table does not exist 
> at location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> Stacktrace
> query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test_orc", "orc")
> common/file_utils.py:68: in create_iceberg_table_from_directory
> file_format))
> common/impala_connection.py:215: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:ImpalaRuntimeException: Error making 'createTable' RPC to 
> Hive Metastore: 
> E   CAUSED BY: IcebergTableLoadingException: Table does not exist at 
> location: 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc
> {code}
> *Standard Error*
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_b59d79db` 
> CASCADE;
> -- 2024-04-02 00:56:55,137 INFO MainThread: Started query 
> f34399a8b7cddd67:031a3b96
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_b59d79db`;
> -- 2024-04-02 00:56:57,302 INFO MainThread: Started query 
> 94465af69907eac5:e33f17e0
> -- 2024-04-02 00:56:57,353 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_b59d79db" for test ID 
> "query_test/test_iceber

[jira] [Commented] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853223#comment-17853223
 ] 

Joe McDonnell commented on IMPALA-13144:


We need to decide whether we want to track this with IMPALA-12967 (which was 
originally about "Table does not exist at location" on the same test) or keep 
it separate.

> TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O 
> error
> --
>
> Key: IMPALA-13144
> URL: https://issues.apache.org/jira/browse/IMPALA-13144
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> A couple test jobs hit a failure on 
> TestIcebergTable.test_migrated_table_field_id_resolution:
> {noformat}
> query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
> vector, unique_database)
> common/impala_test_suite.py:725: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:216: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:384: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:405: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
> open HDFS file 
> hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
> E   Error(2): No such file or directory
> E   Root cause: RemoteException: File does not exist: 
> /test-warehouse/iceberg_migrated_alter_test/00_0
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> E at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> E at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> E at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
> E at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> E at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13144:
--

 Summary: TestIcebergTable.test_migrated_table_field_id_resolution 
fails with Disk I/O error
 Key: IMPALA-13144
 URL: https://issues.apache.org/jira/browse/IMPALA-13144
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


A couple test jobs hit a failure on 
TestIcebergTable.test_migrated_table_field_id_resolution:
{noformat}
query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution
vector, unique_database)
common/impala_test_suite.py:725: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:660: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:1013: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:384: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:405: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Disk I/O error on 
impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to 
open HDFS file 
hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0
E   Error(2): No such file or directory
E   Root cause: RemoteException: File does not exist: 
/test-warehouse/iceberg_migrated_alter_test/00_0
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
E   at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
E   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
E   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
E   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
E   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
E   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
E   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
E   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
E   at java.security.AccessController.doPrivileged(Native Method)
E   at javax.security.auth.Subject.doAs(Subject.java:422)
E   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
E   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13143:
--

 Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times 
out expecting query failure
 Key: IMPALA-13143
 URL: https://issues.apache.org/jira/browse/IMPALA-13143
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
intermittently with:
{noformat}
custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl
self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
common/impala_test_suite.py:1216: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1234: in wait_for_any_state
raise Timeout(timeout_msg)
E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the 
expected states [5], last known state 4{noformat}
This means the query succeeded even though we expected it to fail. This is 
currently limited to s3 jobs. In a different test, we saw issues because s3 is 
slower (see IMPALA-12616).

This test was introduced by IMPALA-13134: 
https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-12616.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

I think the s3 slowness version of this is fixed, so I'm going to resolve this.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-06-07 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853203#comment-17853203
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

Thanks for the feedback[~eyizoha]. I have uploaded a patch that adds a new 
query option:  https://gerrit.cloudera.org/#/c/21492/

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853196#comment-17853196
 ] 

ASF subversion and git services commented on IMPALA-12616:
--

Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ]

IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3

The test_restart_catalogd_while_handling_rpc_response* tests
from custom_cluster/test_restart_services.py have been failing
consistently on s3. The alter table statement is expected to
succeed, but instead it fails with:
"CatalogException: Detected catalog service ID changes"
This manifests as a timeout waiting for the statement to reach
the finished state.

The test relies on specific timing with a sleep injected via a
debug action. The failure stems from the catalog being slower
on s3. The alter table wakes up before the catalog service ID
change has fully completed, and it fails when it sees the
catalog service ID change.

This increases two sleep times:
1. This increases the sleep time before restarting the catalogd
   from 0.5 seconds to 5 seconds. This gives the catalogd longer
   to receive the message about the alter table and respond back
   to the impalad.
2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE
   sleep from 10 seconds to 30 seconds so the alter table
   statement doesn't wake up until the catalog service ID change
   is finalized.
The test is verifying that the right messages are in the impalad
logs, so we know this is still testing the same condition.

This modifies the tests to use wait_for_finished_timeout()
rather than wait_for_state(). This bails out immediately if the
query fails rather than waiting unnecessarily for the full timeout.
This also clears the query options so that later statements
don't inherit the debug_action that the alter table statement
used.

Testing:
 - Ran the tests 100x in a loop on s3
 - Ran the tests 100x in a loop on HDFS

Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29
Reviewed-on: http://gerrit.cloudera.org:8080/21485
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 


> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13142) Documentation for Impala StateStore HA

2024-06-07 Thread Sanjana Malhotra (Jira)
Sanjana Malhotra created IMPALA-13142:
-

 Summary: Documentation for Impala StateStore HA
 Key: IMPALA-13142
 URL: https://issues.apache.org/jira/browse/IMPALA-13142
 Project: IMPALA
  Issue Type: Documentation
Reporter: Sanjana Malhotra
Assignee: Sanjana Malhotra


IMPALA-12156



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13137) Add additional client fetch metrics columns to the queries page

2024-06-07 Thread Surya Hebbar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853096#comment-17853096
 ] 

Surya Hebbar commented on IMPALA-13137:
---

It was confirmed in the meeting, that the expected column was the 
{{{}ClientFetchWaitTimer{}}}'s value and not the difference between "First row 
fetched" and "Last row fetched".

> Add additional client fetch metrics columns to the queries page
> ---
>
> Key: IMPALA-13137
> URL: https://issues.apache.org/jira/browse/IMPALA-13137
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, be
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: completed_query.png, in_flight_query_1.png, 
> in_flight_query_2.png, in_flight_query_3.png, very_short_fetch_timer.png
>
>
> For helping users to better understand query execution times,  it would be 
> helpful to add the following columns on the queries page.
> * First row fetched time - Time taken for the client to fetch the first row
> * Client fetch wait time - Time taken for the client to fetch all rows
> Additional details -
> https://jira.cloudera.com/browse/DWX-18295



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated IMPALA-13141:
---
Description: 
Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 AlterPartitionEvent are 
batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:                      | -1                                        
         | NULL                  |
| Bucket Columns:                   | []                                        
         | NULL             

[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated IMPALA-13141:
---
Description: 
Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 

AlterPartitionEvent are batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:                      | -1                                        
         | NULL                  |
| Bucket Columns:                   | []                                        
         | NULL             

[jira] [Created] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled

2024-06-07 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created IMPALA-13141:
--

 Summary: Partition transactional table is not updated on alter 
partition when hms_event_incremental_refresh_transactional_table is disabled
 Key: IMPALA-13141
 URL: https://issues.apache.org/jira/browse/IMPALA-13141
 Project: IMPALA
  Issue Type: Bug
Reporter: Venugopal Reddy K


Partition transactional table is not updated on alter partition when 
hms_event_incremental_refresh_transactional_table is disabled. 

*Observations:*

1. In case of AlterPartitionEvent, this issue occurs when 
hms_event_incremental_refresh_transactional_table is disabled.

2. In case of BatchPartitionEvent(when. more than 1 

AlterPartitionEvent are batched together), this issue occurs without disabling 
hms_event_incremental_refresh_transactional_table.

*Steps to reproduce:*

1. Create partitioned table and add some partitions from hive:

Note: This step can be from impala too.
{code:java}
0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int);
0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200);

{code}
{code:java}
0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned 
by(p int) tblproperties ('transactional'='true', 
'transactional_properties'='insert_only');
0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict;
0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s;
0: jdbc:hive2://localhost:11050> show partitions test;
++
| partition  |
++
| p=100      |
| p=200      |
++
0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100);
+---++---+
|             col_name              |                     data_type             
         |        comment        |
+---++---+
| i                                 | int                                       
         |                       |
| j                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Partition Information           | NULL                                      
         | NULL                  |
| # col_name                        | data_type                                 
         | comment               |
| p                                 | int                                       
         |                       |
|                                   | NULL                                      
         | NULL                  |
| # Detailed Partition Information  | NULL                                      
         | NULL                  |
| Partition Value:                  | [100]                                     
         | NULL                  |
| Database:                         | default                                   
         | NULL                  |
| Table:                            | test                                      
         | NULL                  |
| CreateTime:                       | Fri Jun 07 14:21:17 IST 2024              
         | NULL                  |
| LastAccessTime:                   | UNKNOWN                                   
         | NULL                  |
| Location:                         | 
hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL                 
 |
| Partition Parameters:             | NULL                                      
         | NULL                  |
|                                   | numFiles                                  
         | 1                     |
|                                   | totalSize                                 
         | 5                     |
|                                   | transient_lastDdlTime                     
         | 1717750277            |
|                                   | NULL                                      
         | NULL                  |
| # Storage Information             | NULL                                      
         | NULL                  |
| SerDe Library:                    | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                      | org.apache.hadoop.mapred.TextInputFormat  
         | NULL                  |
| OutputFormat:                     | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL               
   |
| Compressed:                       | No                                        
         | NULL                  |
| Num Buckets:     

[jira] [Assigned] (IMPALA-13140) Add backend flag to disable small string optimization

2024-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-13140:
--

Assignee: Zoltán Borók-Nagy

> Add backend flag to disable small string optimization
> -
>
> Key: IMPALA-13140
> URL: https://issues.apache.org/jira/browse/IMPALA-13140
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> We could have a backend flag that would make SmallableString::Smallify() a 
> no-op.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13140) Add backend flag to disable small string optimization

2024-06-07 Thread Jira
Zoltán Borók-Nagy created IMPALA-13140:
--

 Summary: Add backend flag to disable small string optimization
 Key: IMPALA-13140
 URL: https://issues.apache.org/jira/browse/IMPALA-13140
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


We could have a backend flag that would make SmallableString::Smallify() a 
no-op.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853074#comment-17853074
 ] 

ASF subversion and git services commented on IMPALA-13130:
--

Commit 3f827bfc2447d8c11a4f09bcb96e86c53b92d753 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f827bfc2 ]

IMPALA-13130: Prioritize EndDataStream messages

Prioritize EndDataStream messages over other types handled by
DataStreamService, and avoid rejecting them when memory limit is
reached. They take very little memory (~75 bytes) and will usually help
reduce memory use by closing out in-progress operations.

Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS
timeouts. Defaults to 1 hour, and can be disabled by setting to -1.

Adds unit tests ensuring EOS are processed even if mem limit is reached
and ahead of TransmitData messages in the queue.

Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8
Reviewed-on: http://gerrit.cloudera.org:8080/21476
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12569) Harden long string testing

2024-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12569:
---
Priority: Critical  (was: Major)

> Harden long string testing
> --
>
> Key: IMPALA-12569
> URL: https://issues.apache.org/jira/browse/IMPALA-12569
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Infrastructure
>Reporter: Zoltán Borók-Nagy
>Priority: Critical
>
> With small string optimization [~csringhofer] pointed out that most of our 
> test data have small strings. And new features are typically tested on the 
> existing test tables (e.g. alltypes that only have small strings), or they 
> add new tests with usually small strings only. The latter is hard to prevent. 
> Therefore the long strings might have less test coverage if we don't pay 
> enough attention.
> To make the situation better, we could
>  # Add long string data to the string column of alltypes table and 
> complextypestbl and update the tests
>  # Add backend flag the makes StringValue.Smallify() a no-op, and create a 
> test job (probably with an ASAN build) that runs the tests with that flag 
> turned on.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-13139:
---
Description: 
When debugging TestRestart, I noticed that the debug_action set for one query 
stayed in effect for subsequent queries that didn't specify query_options.
{noformat}
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

...

# debug_action is still set for these queries:
    self.execute_query_expect_success(self.client, "select age from 
{}".format(tbl_name))
self.execute_query_expect_success(self.client,
        "alter table {} add columns (name string)".format(tbl_name))
    self.execute_query_expect_success(self.client, "select name from 
{}".format(tbl_name)){noformat}
There is a way to clear the query options (self.client.clear_configuration()), 
but this is an odd behavior. It's unclear if some tests rely on this behavior.

> Query options set via ImpalaTestSuite::execute_query_expect_success stay set 
> for subsequent queries
> ---
>
> Key: IMPALA-13139
> URL: https://issues.apache.org/jira/browse/IMPALA-13139
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Major
>
> When debugging TestRestart, I noticed that the debug_action set for one query 
> stayed in effect for subsequent queries that didn't specify query_options.
> {noformat}
>     DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
>                     .format(debug_action_sleep_time_sec * 1000))
>     query = "alter table {} add columns (age int)".format(tbl_name)
>     handle = self.execute_query_async(query, query_options={"debug_action": 
> DEBUG_ACTION})
> ...
> # debug_action is still set for these queries:
>     self.execute_query_expect_success(self.client, "select age from 
> {}".format(tbl_name))
> self.execute_query_expect_success(self.client,
>         "alter table {} add columns (name string)".format(tbl_name))
>     self.execute_query_expect_success(self.client, "select name from 
> {}".format(tbl_name)){noformat}
> There is a way to clear the query options 
> (self.client.clear_configuration()), but this is an odd behavior. It's 
> unclear if some tests rely on this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries

2024-06-06 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13139:
--

 Summary: Query options set via 
ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
 Key: IMPALA-13139
 URL: https://issues.apache.org/jira/browse/IMPALA-13139
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-06 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852961#comment-17852961
 ] 

Joe McDonnell commented on IMPALA-12616:


This is looking timing-related. I was able to get this to pass by adjusting 
some of the sleep times. Basically, it looks like the catalog is slower on s3 
and some operations don't finish in the time we thought they would.

 
{noformat}
    debug_action_sleep_time_sec = 10 (NEW: 30)
    DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}"
                    .format(debug_action_sleep_time_sec * 1000))

    query = "alter table {} add columns (age int)".format(tbl_name)
    handle = self.execute_query_async(query, query_options={"debug_action": 
DEBUG_ACTION})

    # Wait a bit so the RPC from the catalogd arrives to the coordinator.
    time.sleep(0.5) (NEW: 5)

    self.cluster.catalogd.restart()

    # Wait for the query to finish.
    max_wait_time = (debug_action_sleep_time_sec
        + self.WAIT_FOR_CATALOG_UPDATE_TIMEOUT_SEC + 10)
    self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
max_wait_time){noformat}
A successful timeline looks like this:

 
 # Submit an alter table that sleeps before processing the catalog update
 # Sleep a little bit so the catalog knows about the alter table
 # Restart the catalogd
 # The catalog sends an update via the statestore. This has the new catalog ID 
and causes this message: "There was an error processing the impalad catalog 
update. Requesting a full topic update to recover: CatalogException: Detected 
catalog service ID changes from 9c9f7ff13f0e4f72:a896bee4d52fd37e to 
da67610b2c304198:a05daf1bc3d6a4b3. Aborting updateCatalog()"
 # The catalogd sends a full topic update
 # The alter table wakes up and prints this message: Catalog service ID 
mismatch. Current ID: da67610b2c304198:a05daf1bc3d6a4b3. ID in response: 
9c9f7ff13f0e4f72:a896bee4d52fd37e. Catalogd may have been restarted. Waiting 
for new catalog update from statestore.
 # Either it times out or there are too many non-empty updates, and the alter 
table bails out with "W0506 22:42:10.316627 23066 impala-server.cc:2369] 
e14b23a22458ab75:6b269414] Ignoring catalog update result of catalog 
service ID 9c9f7ff13f0e4f72:a896bee4d52fd37e because it does not match with 
current catalog service ID da67610b2c304198:a05daf1bc3d6a4b3. The current 
catalog service ID may be stale (this may be caused by the catalogd having been 
restarted more than once) or newer than the catalog service ID of the update 
result."

If the alter table wakes up from its sleep before #5 happens, the alter table 
will see the catalog service ID change and fail. To avoid that, we adjust the 
WAIT_BEFORE_PROCESSING_CATALOG_UPDATE higher. I also lengthened the sleep in #2 
to give the initial catalog some extra time to hear about the alter table. The 
test verifies that the logs contain the expected messages, so this should be a 
safe modification to the test.

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:24 PM:


Difference from null slots cache:
{code}
# With caching
Query Compilation: 4s678ms
   - Metadata of all 1 tables cached: 26.276ms (26.276ms)
   - Analysis finished: 3s466ms (3s440ms)
   - Authorization finished (noop): 3s467ms (130.395us)
   - Value transfer graph computed: 3s486ms (19.860ms)
   - Single node plan created: 4s402ms (915.149ms)
   - Runtime filters computed: 4s453ms (51.628ms)
   - Distributed plan created: 4s486ms (33.064ms)
   - Planning finished: 4s678ms (191.281ms)
# Without caching via 'set use_null_slots_cache=false'
Query Compilation: 14s845ms
   - Metadata of all 1 tables cached: 7.608ms (7.608ms)
   - Analysis finished: 3s207ms (3s199ms)
   - Authorization finished (noop): 3s207ms (120.606us)
   - Value transfer graph computed: 3s221ms (14.231ms)
   - Single node plan created: 14s610ms (11s389ms)
   - Runtime filters computed: 14s661ms (51.286ms)
   - Distributed plan created: 14s662ms (246.301us)
   - Planning finished: 14s845ms (183.164ms)
{code}

So speeds up single node planning, adds some overhead to distributed planning. 
I'll look into disabling it for distributed planning.

Update: the time to produce cache logging was actually being lumped into 
"Distributed plan created", so that extra 30ms is from debug logging in 
logCacheStats.


was (Author: JIRAUSER288956):
Difference from null slots cache:
{code}
# With caching
Query Compilation: 4s678ms
   - Metadata of all 1 tables cached: 26.276ms (26.276ms)
   - Analysis finished: 3s466ms (3s440ms)
   - Authorization finished (noop): 3s467ms (130.395us)
   - Value transfer graph computed: 3s486ms (19.860ms)
   - Single node plan created: 4s402ms (915.149ms)
   - Runtime filters computed: 4s453ms (51.628ms)
   - Distributed plan created: 4s486ms (33.064ms)
   - Planning finished: 4s678ms (191.281ms)
# Without caching via 'set use_null_slots_cache=false'
Query Compilation: 14s845ms
   - Metadata of all 1 tables cached: 7.608ms (7.608ms)
   - Analysis finished: 3s207ms (3s199ms)
   - Authorization finished (noop): 3s207ms (120.606us)
   - Value transfer graph computed: 3s221ms (14.231ms)
   - Single node plan created: 14s610ms (11s389ms)
   - Runtime filters computed: 14s661ms (51.286ms)
   - Distributed plan created: 14s662ms (246.301us)
   - Planning finished: 14s845ms (183.164ms)
{code}

So speeds up single node planning, adds some overhead to distributed planning. 
I'll look into disabling it for distributed planning.

Update: the time to produce cache logging was actually being lumped into 
"Distributed plan created", so that extra 20s is from debug logging in 
logCacheStats.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
>     URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}

[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:23 PM:


Difference from null slots cache:
{code}
# With caching
Query Compilation: 4s678ms
   - Metadata of all 1 tables cached: 26.276ms (26.276ms)
   - Analysis finished: 3s466ms (3s440ms)
   - Authorization finished (noop): 3s467ms (130.395us)
   - Value transfer graph computed: 3s486ms (19.860ms)
   - Single node plan created: 4s402ms (915.149ms)
   - Runtime filters computed: 4s453ms (51.628ms)
   - Distributed plan created: 4s486ms (33.064ms)
   - Planning finished: 4s678ms (191.281ms)
# Without caching via 'set use_null_slots_cache=false'
Query Compilation: 14s845ms
   - Metadata of all 1 tables cached: 7.608ms (7.608ms)
   - Analysis finished: 3s207ms (3s199ms)
   - Authorization finished (noop): 3s207ms (120.606us)
   - Value transfer graph computed: 3s221ms (14.231ms)
   - Single node plan created: 14s610ms (11s389ms)
   - Runtime filters computed: 14s661ms (51.286ms)
   - Distributed plan created: 14s662ms (246.301us)
   - Planning finished: 14s845ms (183.164ms)
{code}

So speeds up single node planning, adds some overhead to distributed planning. 
I'll look into disabling it for distributed planning.

Update: the time to produce cache logging was actually being lumped into 
"Distributed plan created", so that extra 20s is from debug logging in 
logCacheStats.


was (Author: JIRAUSER288956):
Difference from null slots cache:
{code}
# With caching
Query Compilation: 4s678ms
   - Metadata of all 1 tables cached: 26.276ms (26.276ms)
   - Analysis finished: 3s466ms (3s440ms)
   - Authorization finished (noop): 3s467ms (130.395us)
   - Value transfer graph computed: 3s486ms (19.860ms)
   - Single node plan created: 4s402ms (915.149ms)
   - Runtime filters computed: 4s453ms (51.628ms)
   - Distributed plan created: 4s486ms (33.064ms)
   - Planning finished: 4s678ms (191.281ms)
# Without caching via 'set use_null_slots_cache=false'
Query Compilation: 14s845ms
   - Metadata of all 1 tables cached: 7.608ms (7.608ms)
   - Analysis finished: 3s207ms (3s199ms)
   - Authorization finished (noop): 3s207ms (120.606us)
   - Value transfer graph computed: 3s221ms (14.231ms)
   - Single node plan created: 14s610ms (11s389ms)
   - Runtime filters computed: 14s661ms (51.286ms)
   - Distributed plan created: 14s662ms (246.301us)
   - Planning finished: 14s845ms (183.164ms)
{code}

So speeds up single node planning, adds some overhead to distributed planning. 
I'll look into disabling it for distributed planning.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSub

[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918
 ] 

Michael Smith commented on IMPALA-12800:


Difference from null slots cache:
{code}
# With caching
Query Compilation: 4s678ms
   - Metadata of all 1 tables cached: 26.276ms (26.276ms)
   - Analysis finished: 3s466ms (3s440ms)
   - Authorization finished (noop): 3s467ms (130.395us)
   - Value transfer graph computed: 3s486ms (19.860ms)
   - Single node plan created: 4s402ms (915.149ms)
   - Runtime filters computed: 4s453ms (51.628ms)
   - Distributed plan created: 4s486ms (33.064ms)
   - Planning finished: 4s678ms (191.281ms)
# Without caching via 'set use_null_slots_cache=false'
Query Compilation: 14s845ms
   - Metadata of all 1 tables cached: 7.608ms (7.608ms)
   - Analysis finished: 3s207ms (3s199ms)
   - Authorization finished (noop): 3s207ms (120.606us)
   - Value transfer graph computed: 3s221ms (14.231ms)
   - Single node plan created: 14s610ms (11s389ms)
   - Runtime filters computed: 14s661ms (51.286ms)
   - Distributed plan created: 14s662ms (246.301us)
   - Planning finished: 14s845ms (183.164ms)
{code}

So speeds up single node planning, adds some overhead to distributed planning. 
I'll look into disabling it for distributed planning.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQuery

[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:09 PM:


I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves 
query compilation for the repro from
{code}
# 1st run
Query Compilation: 1m15s
   - Metadata load started: 75.088ms (75.088ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms)
   - Analysis finished: 7s504ms (4s367ms)
   - Authorization finished (noop): 7s505ms (946.982us)
   - Value transfer graph computed: 7s553ms (47.618ms)
   - Single node plan created: 1m14s (1m7s)
   - Runtime filters computed: 1m15s (874.659ms)
   - Distributed plan created: 1m15s (1.168ms)
   - Planning finished: 1m15s (284.717ms)
# 2nd run
Query Compilation: 1m6s
   - Metadata of all 1 tables cached: 18.799ms (18.799ms)
   - Analysis finished: 3s299ms (3s280ms)
   - Authorization finished (noop): 3s299ms (118.618us)
   - Value transfer graph computed: 3s319ms (19.983ms)
   - Single node plan created: 1m5s (1m2s)
   - Runtime filters computed: 1m6s (808.587ms)
   - Distributed plan created: 1m6s (188.167us)
   - Planning finished: 1m6s (189.985ms)
{code}
to
{code}
# 1st run
Query Compilation: 8s649ms
   - Metadata load started: 62.291ms (62.291ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms)
   - Analysis finished: 7s021ms (4s002ms)
   - Authorization finished (noop): 7s021ms (569.098us)
   - Value transfer graph computed: 7s070ms (48.329ms)
   - Single node plan created: 8s194ms (1s124ms)
   - Runtime filters computed: 8s261ms (67.366ms)
   - Distributed plan created: 8s365ms (103.186ms)
   - Planning finished: 8s649ms (284.506ms)
# 2nd run
Query Compilation: 4s621ms
   - Metadata of all 1 tables cached: 17.932ms (17.932ms)
   - Analysis finished: 3s391ms (3s373ms)
   - Authorization finished (noop): 3s391ms (133.671us)
   - Value transfer graph computed: 3s412ms (20.547ms)
   - Single node plan created: 4s347ms (935.582ms)
   - Runtime filters computed: 4s399ms (51.706ms)
   - Distributed plan created: 4s434ms (35.380ms)
   - Planning finished: 4s621ms (187.070ms)
{code}

Single node plan creation improves from over 1 minute to ~1 second. There may 
be some increase from hashing Exprs that makes distributed plan creation a 
little slower, but still on the order of milliseconds.


was (Author: JIRAUSER288956):
I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves 
query compilation for the repro from
{code}
# 1st run
Query Compilation: 1m15s
   - Metadata load started: 75.088ms (75.088ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms)
   - Analysis finished: 7s504ms (4s367ms)
   - Authorization finished (noop): 7s505ms (946.982us)
   - Value transfer graph computed: 7s553ms (47.618ms)
   - Single node plan created: 1m14s (1m7s)
   - Runtime filters computed: 1m15s (874.659ms)
   - Distributed plan created: 1m15s (1.168ms)
   - Planning finished: 1m15s (284.717ms)
# 2nd run
Query Compilation: 1m6s
   - Metadata of all 1 tables cached: 18.799ms (18.799ms)
   - Analysis finished: 3s299ms (3s280ms)
   - Authorization finished (noop): 3s299ms (118.618us)
   - Value transfer graph computed: 3s319ms (19.983ms)
   - Single node plan created: 1m5s (1m2s)
   - Runtime filters computed: 1m6s (808.587ms)
   - Distributed plan created: 1m6s (188.167us)
   - Planning finished: 1m6s (189.985ms)
{code}
to
{code}
# 1st run
Query Compilation: 8s649ms
   - Metadata load started: 62.291ms (62.291ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms)
   - Analysis finished: 7s021ms (4s002ms)
   - Authorization finished (noop): 7s021ms (569.098us)
   - Value transfer graph computed: 7s070ms (48.329ms)
   - Single node plan created: 8s194ms (1s124ms)
   - Runtime filters computed: 8s261ms (67.366ms)
   - Distributed plan created: 8s365ms (103.186ms)
   - Planning finished: 8s649ms (284.506ms)
# 2nd run
Query Compilation: 4s621ms
   - Metadata of all 1 tables cached: 17.932ms (17.932ms)
   - Analysis finished: 3s391ms (3s373ms)
   - Authorization finished (noop): 3s391ms (133.671us)
   - Value transfer graph computed: 3s412ms (20.547ms)
   - Single node plan created: 4s347ms (935.582ms)
   - Runtime filters computed: 4s399ms (51.706ms)
   - Distributed plan created

[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912
 ] 

Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:09 PM:


I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves 
query compilation for the repro from
{code}
# 1st run
Query Compilation: 1m15s
   - Metadata load started: 75.088ms (75.088ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms)
   - Analysis finished: 7s504ms (4s367ms)
   - Authorization finished (noop): 7s505ms (946.982us)
   - Value transfer graph computed: 7s553ms (47.618ms)
   - Single node plan created: 1m14s (1m7s)
   - Runtime filters computed: 1m15s (874.659ms)
   - Distributed plan created: 1m15s (1.168ms)
   - Planning finished: 1m15s (284.717ms)
# 2nd run
Query Compilation: 1m6s
   - Metadata of all 1 tables cached: 18.799ms (18.799ms)
   - Analysis finished: 3s299ms (3s280ms)
   - Authorization finished (noop): 3s299ms (118.618us)
   - Value transfer graph computed: 3s319ms (19.983ms)
   - Single node plan created: 1m5s (1m2s)
   - Runtime filters computed: 1m6s (808.587ms)
   - Distributed plan created: 1m6s (188.167us)
   - Planning finished: 1m6s (189.985ms)
{code}
to
{code}
# 1st run
Query Compilation: 8s649ms
   - Metadata load started: 62.291ms (62.291ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms)
   - Analysis finished: 7s021ms (4s002ms)
   - Authorization finished (noop): 7s021ms (569.098us)
   - Value transfer graph computed: 7s070ms (48.329ms)
   - Single node plan created: 8s194ms (1s124ms)
   - Runtime filters computed: 8s261ms (67.366ms)
   - Distributed plan created: 8s365ms (103.186ms)
   - Planning finished: 8s649ms (284.506ms)
# 2nd run
Query Compilation: 4s621ms
   - Metadata of all 1 tables cached: 17.932ms (17.932ms)
   - Analysis finished: 3s391ms (3s373ms)
   - Authorization finished (noop): 3s391ms (133.671us)
   - Value transfer graph computed: 3s412ms (20.547ms)
   - Single node plan created: 4s347ms (935.582ms)
   - Runtime filters computed: 4s399ms (51.706ms)
   - Distributed plan created: 4s434ms (35.380ms)
   - Planning finished: 4s621ms (187.070ms)
{code}

Single node plan creation improves from over 1 minute to ~1 second. Runtime 
filter computation also seems to have improved by an order of magnitude. There 
may be some increase from hashing Exprs that makes distributed plan creation a 
little slower, but still on the order of milliseconds.


was (Author: JIRAUSER288956):
I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves 
query compilation for the repro from
{code}
# 1st run
Query Compilation: 1m15s
   - Metadata load started: 75.088ms (75.088ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms)
   - Analysis finished: 7s504ms (4s367ms)
   - Authorization finished (noop): 7s505ms (946.982us)
   - Value transfer graph computed: 7s553ms (47.618ms)
   - Single node plan created: 1m14s (1m7s)
   - Runtime filters computed: 1m15s (874.659ms)
   - Distributed plan created: 1m15s (1.168ms)
   - Planning finished: 1m15s (284.717ms)
# 2nd run
Query Compilation: 1m6s
   - Metadata of all 1 tables cached: 18.799ms (18.799ms)
   - Analysis finished: 3s299ms (3s280ms)
   - Authorization finished (noop): 3s299ms (118.618us)
   - Value transfer graph computed: 3s319ms (19.983ms)
   - Single node plan created: 1m5s (1m2s)
   - Runtime filters computed: 1m6s (808.587ms)
   - Distributed plan created: 1m6s (188.167us)
   - Planning finished: 1m6s (189.985ms)
{code}
to
{code}
# 1st run
Query Compilation: 8s649ms
   - Metadata load started: 62.291ms (62.291ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms)
   - Analysis finished: 7s021ms (4s002ms)
   - Authorization finished (noop): 7s021ms (569.098us)
   - Value transfer graph computed: 7s070ms (48.329ms)
   - Single node plan created: 8s194ms (1s124ms)
   - Runtime filters computed: 8s261ms (67.366ms)
   - Distributed plan created: 8s365ms (103.186ms)
   - Planning finished: 8s649ms (284.506ms)
# 2nd run
Query Compilation: 4s621ms
   - Metadata of all 1 tables cached: 17.932ms (17.932ms)
   - Analysis finished: 3s391ms (3s373ms)
   - Authorization finished (noop): 3s391ms (133.671us)
   - Value transfer graph computed: 3s412ms (20.547ms)
   - Single node plan created: 4s347ms (935.582ms

[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912
 ] 

Michael Smith commented on IMPALA-12800:


I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves 
query compilation for the repro from
{code}
# 1st run
Query Compilation: 1m15s
   - Metadata load started: 75.088ms (75.088ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms)
   - Analysis finished: 7s504ms (4s367ms)
   - Authorization finished (noop): 7s505ms (946.982us)
   - Value transfer graph computed: 7s553ms (47.618ms)
   - Single node plan created: 1m14s (1m7s)
   - Runtime filters computed: 1m15s (874.659ms)
   - Distributed plan created: 1m15s (1.168ms)
   - Planning finished: 1m15s (284.717ms)
# 2nd run
Query Compilation: 1m6s
   - Metadata of all 1 tables cached: 18.799ms (18.799ms)
   - Analysis finished: 3s299ms (3s280ms)
   - Authorization finished (noop): 3s299ms (118.618us)
   - Value transfer graph computed: 3s319ms (19.983ms)
   - Single node plan created: 1m5s (1m2s)
   - Runtime filters computed: 1m6s (808.587ms)
   - Distributed plan created: 1m6s (188.167us)
   - Planning finished: 1m6s (189.985ms)
{code}
to
{code}
# 1st run
Query Compilation: 8s649ms
   - Metadata load started: 62.291ms (62.291ms)
   - Metadata load finished. loaded-tables=1/1 load-requests=1 
catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms)
   - Analysis finished: 7s021ms (4s002ms)
   - Authorization finished (noop): 7s021ms (569.098us)
   - Value transfer graph computed: 7s070ms (48.329ms)
   - Single node plan created: 8s194ms (1s124ms)
   - Runtime filters computed: 8s261ms (67.366ms)
   - Distributed plan created: 8s365ms (103.186ms)
   - Planning finished: 8s649ms (284.506ms)
# 2nd run
Query Compilation: 4s621ms
   - Metadata of all 1 tables cached: 17.932ms (17.932ms)
   - Analysis finished: 3s391ms (3s373ms)
   - Authorization finished (noop): 3s391ms (133.671us)
   - Value transfer graph computed: 3s412ms (20.547ms)
   - Single node plan created: 4s347ms (935.582ms)
   - Runtime filters computed: 4s399ms (51.706ms)
   - Distributed plan created: 4s434ms (35.380ms)
   - Planning finished: 4s621ms (187.070ms)
{code}

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
> 

[jira] [Commented] (IMPALA-12981) Support a column list in compute stats that is retrieved via a subquery

2024-06-06 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852872#comment-17852872
 ] 

Michael Smith commented on IMPALA-12981:


Common Table Expression optimizations would help if we produce multiple 
subqueries that rely on the list of column names.

> Support a column list in compute stats that is retrieved via a subquery  
> -
>
> Key: IMPALA-12981
> URL: https://issues.apache.org/jira/browse/IMPALA-12981
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Manish Maheshwari
>Priority: Major
>
> Support a column list in compute stats that is retrived via a subquery - 
> Specifically we want to use Impala query history tables where we collect the 
> columns in a table that are using for joins, aggegrates, filters etc to be 
> passed into compute stats command.
> Ideally the way that we would want it to work is that generate a table from 
> the query history table that has the most frequent table and most frequent 
> columns accessed  and then feed them into the compute stats command. 
> Suggested Syntax - 
> {code:java}
> Table Level - 
> compute stats db.tbl (
> select distinct join_columns from
> from sys.impala_query_log
> where contains(tables_queried, "db.tbl")
> and query_dttm >current_timestamp()-7
> and join_columns rlike 'db.tbl'
> ) 
> Across Tables - 
> compute stats on (select tables, columns from sys.impala_query_log where 
> query_dttm > current_timestamp()-7 group tables, columns by order by tables, 
> columns, count(1) desc having count(1) > 1000  )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12981) Support a column list in compute stats that is retrieved via a subquery

2024-06-06 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852871#comment-17852871
 ] 

Riza Suminto commented on IMPALA-12981:
---

Compute stats over just subset of columns currently relies on getting the 
column name from the SQL syntax of COMPUTE STATS

[https://github.com/apache/impala/blob/753ee9b8a80d8e4c0db966a3132446a5aceb05cd/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L189-L191]

Supporting this feature will require running the subquery first to retrieve 
list of column names before running the rest of COMPUTE STATS child queries.

> Support a column list in compute stats that is retrieved via a subquery  
> -
>
> Key: IMPALA-12981
> URL: https://issues.apache.org/jira/browse/IMPALA-12981
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Manish Maheshwari
>Priority: Major
>
> Support a column list in compute stats that is retrived via a subquery - 
> Specifically we want to use Impala query history tables where we collect the 
> columns in a table that are using for joins, aggegrates, filters etc to be 
> passed into compute stats command.
> Ideally the way that we would want it to work is that generate a table from 
> the query history table that has the most frequent table and most frequent 
> columns accessed  and then feed them into the compute stats command. 
> Suggested Syntax - 
> {code:java}
> Table Level - 
> compute stats db.tbl (
> select distinct join_columns from
> from sys.impala_query_log
> where contains(tables_queried, "db.tbl")
> and query_dttm >current_timestamp()-7
> and join_columns rlike 'db.tbl'
> ) 
> Across Tables - 
> compute stats on (select tables, columns from sys.impala_query_log where 
> query_dttm > current_timestamp()-7 group tables, columns by order by tables, 
> columns, count(1) desc having count(1) > 1000  )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns

2024-06-06 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852862#comment-17852862
 ] 

Riza Suminto commented on IMPALA-13086:
---

Identifying unique column in Iceberg is possible through 
[https://iceberg.apache.org/spec/#identifier-field-ids] (see also IMPALA-12729).

> Cardinality estimate of AggregationNode should consider predicates on 
> group-by columns
> --
>
> Key: IMPALA-13086
> URL: https://issues.apache.org/jira/browse/IMPALA-13086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: plan.txt
>
>
> Consider the following tables:
> {code:sql}
> CREATE EXTERNAL TABLE t1(
>   t1_id bigint,
>   t5_id bigint,
>   t5_name string,
>   register_date string
> ) stored as textfile;
> CREATE EXTERNAL TABLE t2(
>   t1_id bigint,
>   t3_id bigint,
>   pay_time timestamp,
>   refund_time timestamp,
>   state_code int
> ) stored as textfile;
> CREATE EXTERNAL TABLE t3(
>   t3_id bigint,
>   t3_name string,
>   class_id int
> ) stored as textfile;
> CREATE EXTERNAL TABLE t5( 
>   id bigint,
>   t5_id bigint,
>   t5_name string,
>   branch_id bigint,
>   branch_name string
> ) stored as textfile;
> alter table t1 set tblproperties('numRows'='6031170829');
> alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0');
> alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0');
> alter table t1 set column stats t5_name 
> ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738');
> alter table t1 set column stats register_date 
> ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8');
> alter table t2 set tblproperties('numRows'='864341085');
> alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0');
> alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503');
> alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0');
> alter table t2 set column stats refund_time 
> ('numDVs'='251658','numNulls'='791645118');
> alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0');
> alter table t3 set tblproperties('numRows'='4452');
> alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0');
> alter table t3 set column stats t3_name 
> ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234');
> alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0');
> alter table t5 set tblproperties('numRows'='2177245');
> alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0');
> alter table t5 set column stats t5_name 
> ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934');
> alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0');
> alter table t5 set column stats branch_name 
> ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172');
> {code}
> Put a data file to each table to make the stats valid
> {code:bash}
> echo '2024' > data.txt
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5
> {code}
> REFRESH these tables after adding the data files.
> The cardinality of AggregationNodes are overestimated in the following query:
> {code:sql}
> explain select 
>   register_date,
>   t4.t5_id, 
>   t5.t5_name,
>   t5.branch_name,
>   count(distinct t1_id),
>   count(distinct case when diff_day=0 then t1_id else null end ),
>   count(distinct case when diff_day<=3 then t1_id else null end ),
>   count(distinct case when diff_day<=7 then t1_id else null end ),
>   count(distinct case when diff_day<=14 then t1_id else null end ),
>   count(distinct case when diff_day<=30 then t1_id else null end ),
>   count(distinct case when diff_day<=60 then t1_id else null end ),
>   count(distinct case when pay_time is not null then t1_id else null end )
> from (
>   select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name,
> datediff(pay_time,register_date) diff_day
>   from (
> select t1_id,pay_time,t3_id from t2
> where state_code = 0 and pay_time>=trunc(NOW(),'Y')
>   and cast(pay_time as date) <> cast(refund_time as date)
>   )t2
>   join t3 

[jira] [Updated] (IMPALA-12823) Repeated query not found messages in impalad.INFO logs

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-12823:
--
Component/s: Backend
   Language: javascript

> Repeated query not found messages in impalad.INFO logs
> --
>
> Key: IMPALA-12823
> URL: https://issues.apache.org/jira/browse/IMPALA-12823
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Minor
> Attachments: repeated_impalad_info_logs.png
>
>
> If an unknown or closed query page is open. Repeated messages are produced in 
> the logs.
> This is due to those pages repetitively querying the impala server, when the 
> query does not exist.
> !repeated_impalad_info_logs.png|width=608,height=84!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12823) Repeated query not found messages in impalad.INFO logs

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12823.
-

> Repeated query not found messages in impalad.INFO logs
> --
>
> Key: IMPALA-12823
> URL: https://issues.apache.org/jira/browse/IMPALA-12823
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Minor
> Attachments: repeated_impalad_info_logs.png
>
>
> If an unknown or closed query page is open. Repeated messages are produced in 
> the logs.
> This is due to those pages repetitively querying the impala server, when the 
> query does not exist.
> !repeated_impalad_info_logs.png|width=608,height=84!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12803) Fix missing exchange lines in query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-12803:
--
Component/s: Backend
   Language: javascript

> Fix missing exchange lines in query timeline
> 
>
> Key: IMPALA-12803
> URL: https://issues.apache.org/jira/browse/IMPALA-12803
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: missing_exchange_lines.png, proper_exchange_lines.png
>
>
> In the fragment diagram of the query timeline, the exchange lines between 
> nodes is missing when plan order is used. !missing_exchange_lines.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12803) Fix missing exchange lines in query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12803.
-

> Fix missing exchange lines in query timeline
> 
>
> Key: IMPALA-12803
> URL: https://issues.apache.org/jira/browse/IMPALA-12803
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: missing_exchange_lines.png, proper_exchange_lines.png
>
>
> In the fragment diagram of the query timeline, the exchange lines between 
> nodes is missing when plan order is used. !missing_exchange_lines.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12415) Implement tests for graphical query timeline in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar resolved IMPALA-12415.
---
Resolution: Fixed

> Implement tests for graphical query timeline in webUI
> -
>
> Key: IMPALA-12415
> URL: https://issues.apache.org/jira/browse/IMPALA-12415
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Manually testing the webUI's query timeline each time is unreliable and may 
> produce edge cases, that are not always covered due to human error.
> To ensure proper functioning, as more features are incorporated, proper test 
> cases along with a testing framework is required for the query timeline.
> As a first step, for implementing unit and integration tests, the query 
> timeline's script should be divided into multiple properly functioning 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12415) Implement tests for graphical query timeline in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12415.
-

> Implement tests for graphical query timeline in webUI
> -
>
> Key: IMPALA-12415
> URL: https://issues.apache.org/jira/browse/IMPALA-12415
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Manually testing the webUI's query timeline each time is unreliable and may 
> produce edge cases, that are not always covered due to human error.
> To ensure proper functioning, as more features are incorporated, proper test 
> cases along with a testing framework is required for the query timeline.
> As a first step, for implementing unit and integration tests, the query 
> timeline's script should be divided into multiple properly functioning 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12364.
-

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar resolved IMPALA-12364.
---
Resolution: Fixed

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-13105:
--
Component/s: Backend

> Multiple imported query profiles fail to import/clear at once
> -
>
> Key: IMPALA-13105
> URL: https://issues.apache.org/jira/browse/IMPALA-13105
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> When multiple query profiles are chosen at once, the last query profile in 
> the insertion queue fails as the page reloads without providing a delay for 
> inserting it.
>  
> The same behavior is seen when clearing all the query profiles.
>  
> This is mostly seen in Chromium based browsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-13105:
--
Language: javascript

> Multiple imported query profiles fail to import/clear at once
> -
>
> Key: IMPALA-13105
> URL: https://issues.apache.org/jira/browse/IMPALA-13105
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> When multiple query profiles are chosen at once, the last query profile in 
> the insertion queue fails as the page reloads without providing a delay for 
> inserting it.
>  
> The same behavior is seen when clearing all the query profiles.
>  
> This is mostly seen in Chromium based browsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-13105.
-

> Multiple imported query profiles fail to import/clear at once
> -
>
> Key: IMPALA-13105
> URL: https://issues.apache.org/jira/browse/IMPALA-13105
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> When multiple query profiles are chosen at once, the last query profile in 
> the insertion queue fails as the page reloads without providing a delay for 
> inserting it.
>  
> The same behavior is seen when clearing all the query profiles.
>  
> This is mostly seen in Chromium based browsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13106) Support larger imported query profile sizes through compression

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-13106:
--
Language: HTML CSS javascript

> Support larger imported query profile sizes through compression
> ---
>
> Key: IMPALA-13106
> URL: https://issues.apache.org/jira/browse/IMPALA-13106
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Imported query profiles are currently being stored in IndexedDB.
> Although IndexedDB does not have storage limitations like other browser 
> storage APIs, there is a limit on the data that can be stored in one 
> attribute/field.
> This imposes a limitation on the size of query profiles. After some testing, 
> I have found this limit to be around 220 MBs.
> So, it would be helpful to use compression on JSON query profiles, allowing 
> for much larger query profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13106) Support larger imported query profile sizes through compression

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-13106:
--
Component/s: Backend
 Infrastructure

> Support larger imported query profile sizes through compression
> ---
>
> Key: IMPALA-13106
> URL: https://issues.apache.org/jira/browse/IMPALA-13106
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Imported query profiles are currently being stored in IndexedDB.
> Although IndexedDB does not have storage limitations like other browser 
> storage APIs, there is a limit on the data that can be stored in one 
> attribute/field.
> This imposes a limitation on the size of query profiles. After some testing, 
> I have found this limit to be around 220 MBs.
> So, it would be helpful to use compression on JSON query profiles, allowing 
> for much larger query profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-12364:
--
Component/s: Infrastructure

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar reopened IMPALA-12364:
---

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-12415) Implement tests for graphical query timeline in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar reopened IMPALA-12415:
---

> Implement tests for graphical query timeline in webUI
> -
>
> Key: IMPALA-12415
> URL: https://issues.apache.org/jira/browse/IMPALA-12415
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Manually testing the webUI's query timeline each time is unreliable and may 
> produce edge cases, that are not always covered due to human error.
> To ensure proper functioning, as more features are incorporated, proper test 
> cases along with a testing framework is required for the query timeline.
> As a first step, for implementing unit and integration tests, the query 
> timeline's script should be divided into multiple properly functioning 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12415) Implement tests for graphical query timeline in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar updated IMPALA-12415:
--
Component/s: Infrastructure
   Language: javascript shell bash  (was: javascript)

> Implement tests for graphical query timeline in webUI
> -
>
> Key: IMPALA-12415
> URL: https://issues.apache.org/jira/browse/IMPALA-12415
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Manually testing the webUI's query timeline each time is unreliable and may 
> produce edge cases, that are not always covered due to human error.
> To ensure proper functioning, as more features are incorporated, proper test 
> cases along with a testing framework is required for the query timeline.
> As a first step, for implementing unit and integration tests, the query 
> timeline's script should be divided into multiple properly functioning 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12688) Support JSON profile imports in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12688.
-

> Support JSON profile imports in webUI
> -
>
> Key: IMPALA-12688
> URL: https://issues.apache.org/jira/browse/IMPALA-12688
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: clear_all_button.png, descending_order_start_time.png, 
> imported_profiles_section.png, imported_queries_button.png, 
> imported_queries_list.png, imported_queries_page.png, 
> imported_query_statement.png, imported_query_text_plan.png, 
> imported_query_timeline.png, multiple_query_profile_import.png
>
>
> It would be helpful for users to visualize the query timeline by selecting a 
> local JSON query profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12473) Fix query profile's missing event timestamps exception in query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12473.
-

> Fix query profile's missing event timestamps exception in query timeline
> 
>
> Key: IMPALA-12473
> URL: https://issues.apache.org/jira/browse/IMPALA-12473
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Minor
> Attachments: q4_json_profile_missing_event_timestamp.png, 
> q64_json_profile_missing_event_timestamp.png
>
>
> The query profile contains different fragment's and there plan node's event 
> timestamps.
>  
> Sometimes the expected timestamps from an event are missing, this currently 
> throws an exception, hence it stops the rendering of query timeline.
>  
> The issue needs to be fixed with some validation of these timestamp event 
> labels, without raising an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12504) Split graphical query timeline script into modules for testing

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12504.
-

> Split graphical query timeline script into modules for testing
> --
>
> Key: IMPALA-12504
> URL: https://issues.apache.org/jira/browse/IMPALA-12504
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Minor
>
> The graphical query timeline needs to be split into multiple es6 modules for 
> better maintainability and for writing automated tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12365) Show fragment's memory and thread usage on the query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12365.
-

> Show fragment's memory and thread usage on the query timeline
> -
>
> Key: IMPALA-12365
> URL: https://issues.apache.org/jira/browse/IMPALA-12365
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: aligned_gridlines.png, 
> aligned_gridlines_and_hovering_scroll.mkv, both_charts_resize.mkv, 
> both_charts_resize.png, clickable_plan_nodes.mkv, clickable_plan_nodes.png, 
> draggable_resize_handle.png, fragment_metrics_chart_resize.mkv, 
> fragment_metrics_close_button.png, fragment_metrics_resize_bar.png, 
> hor_zoom_buttons.png, horizontal_zoom_buttons.mkv, 
> multiple_fragment_metrics.mkv, multiple_fragment_metrics_cropped.png, 
> resize_drag_handle.mkv
>
>
> The query timeline's fragment diagram can be used to display different memory 
> and thread usage metrics, to support query planning and debugging purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12417) Query timeline not working when enable asynchronous codegen

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12417.
-

> Query timeline not working when enable asynchronous codegen
> ---
>
> Key: IMPALA-12417
> URL: https://issues.apache.org/jira/browse/IMPALA-12417
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zihao Ye
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: jirabug.png, jirabug2.jpeg
>
>
> When set async_codegen=true, the timeline page of the query will be 
> unavailable, seemingly because the preset colors did not take into account 
> events of asynchronous Codegen.
> !jirabug2.jpeg!
> !jirabug.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12364.
-

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12415) Implement tests for graphical query timeline in webUI

2024-06-06 Thread Surya Hebbar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Hebbar closed IMPALA-12415.
-

> Implement tests for graphical query timeline in webUI
> -
>
> Key: IMPALA-12415
> URL: https://issues.apache.org/jira/browse/IMPALA-12415
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> Manually testing the webUI's query timeline each time is unreliable and may 
> produce edge cases, that are not always covered due to human error.
> To ensure proper functioning, as more features are incorporated, proper test 
> cases along with a testing framework is required for the query timeline.
> As a first step, for implementing unit and integration tests, the query 
> timeline's script should be divided into multiple properly functioning 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   10   >