[jira] [Created] (IMPALA-13155) Not all Tuple::DeepCopy() smallify string values
Zoltán Borók-Nagy created IMPALA-13155: -- Summary: Not all Tuple::DeepCopy() smallify string values Key: IMPALA-13155 URL: https://issues.apache.org/jira/browse/IMPALA-13155 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Currently "Tuple::DeepCopy(const TupleDescriptor& desc, char** data, int* offset, bool convert_ptrs)" does not try to smallify string values, although it could safely do that. We use that version of DeepCopy when we BROADCAST data between fragments, so smallifying on that path can be beneficial. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854474#comment-17854474 ] ASF subversion and git services commented on IMPALA-12800: -- Commit 4681666e9386d87c647d19d6333750c16b6fa0c1 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4681666e9 ] IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation isTrueWithNullSlots() can be expensive when it has to query the backend. Many of the expressions will look similar, especially in large auto-generated expressions. Adds a cache based on the nullified expression to avoid querying the backend for expressions with identical structure. With DEBUG logging enabled for the Analyzer, computes and logs stats about the null slots cache. Adds 'use_null_slots_cache' query option to disable caching. Documents the new option. Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd Reviewed-on: http://gerrit.cloudera.org:8080/21484 Reviewed-by: Quanlong Huang Tested-by: Impala Public Jenkins > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializ
[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854473#comment-17854473 ] Michael Smith edited comment on IMPALA-12800 at 6/12/24 3:36 PM: - Performance on these types of queries has been substantially improved. We saw an improvement of 20x on the example query. It would likely be more on larger queries as we switched from O(n^2) to O(n\) operations for ExprSubstitutionMap. was (Author: JIRAUSER288956): Performance on these types of queries has been substantially improved. We saw an improvement of 20x on the example query. It would likely be more on larger queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap. > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194) > at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275) > at
[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12800. Fix Version/s: Impala 4.5.0 Resolution: Fixed Performance on these types of queries has been substantially improved. We saw an improvement of 20x on the example query. It would likely be more on larger queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap. > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194) > at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275) > at > org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888) > at > org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181) > at > org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147) > at > org.apache.impala.analysis.TupleIsN
[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements
Quanlong Huang created IMPALA-13154: --- Summary: Some tables are missing in Top-N Tables with Highest Memory Requirements Key: IMPALA-13154 URL: https://issues.apache.org/jira/browse/IMPALA-13154 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with Highest Memory Requirements". However, not all tables are counted there. E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata loading on it. When it's done, the table is not shown in the WebUI. The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 'type' is ThriftObjectType.FULL: [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459] This used to be the place that all code paths using the table will go to. However, we've done bunch of optimizations to not getting the FULL thrift object of the table. We should move the code of updating the list of largest tables somewhere that all table usages can reach, e.g. after loading the metadata of the table, we can update its estimatedMetadataSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java
[ https://issues.apache.org/jira/browse/IMPALA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854208#comment-17854208 ] Sai Hemanth Gantasala commented on IMPALA-13153: Thanks for raising the concern. I'll address this issue soon. > Unreachable catch clause in MetastoreEvents.java > > > Key: IMPALA-13153 > URL: https://issues.apache.org/jira/browse/IMPALA-13153 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.5.0 >Reporter: Laszlo Gaal >Assignee: Sai Hemanth Gantasala >Priority: Critical > > In recent builds of master the frontend build reports the following warning: > {code} > 22:38:28 20:38:19 [WARNING] > /home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9] > unreachable catch clause > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13153) Unreachable catch clause in MetastoreEvents.java
Laszlo Gaal created IMPALA-13153: Summary: Unreachable catch clause in MetastoreEvents.java Key: IMPALA-13153 URL: https://issues.apache.org/jira/browse/IMPALA-13153 Project: IMPALA Issue Type: Bug Components: Catalog Affects Versions: Impala 4.5.0 Reporter: Laszlo Gaal Assignee: Sai Hemanth Gantasala In recent builds of master the frontend build reports the following warning: {code} 22:38:28 20:38:19 [WARNING] /home/ubuntu/Impala/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:[1466,9] unreachable catch clause {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11871. -- Resolution: Fixed Resolve the issue since the fix has been merged. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 impala hive 355 2023-01-27 17:17 > /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq > rw-rw---+ 3 impala hive 355 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq > drwxrwx---+ - impala hive 0 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/_impala_insert_staging > rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 > /warehouse/tablespace/external/hive/t1/test{noformat} > Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if > I add user impala to group supergroup on the catalogd host, this query will > succeed past the authorization. > Additionally, this query does not trip up during analysis when catalog v2 is > enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not > implemented there yet and always returns null[2]. > [1] > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504] > [2] > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298] > ~~ > Ideally, when Ranger authorization is in place, we should: > 1) Not check access level during analysis > 2) Incorporate Ranger ACLs during analysis -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854130#comment-17854130 ] Riza Suminto commented on IMPALA-13152: --- Filed a patch at: [https://gerrit.cloudera.org/c/21504/] > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13151. Fix Version/s: Impala 4.5.0 Resolution: Fixed > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > Fix For: Impala 4.5.0 > > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854089#comment-17854089 ] Riza Suminto commented on IMPALA-13152: --- Tried your example and I get NaN for BaseProcessingCost. {noformat}Query: explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! cost-total=0 max-instances=1 cost/inst=0 #cons:#prod=0:0 total-cost=NaN{noformat} > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854069#comment-17854069 ] Riza Suminto commented on IMPALA-13152: --- [~stigahuang] is this still happen after IMPALA-13119? I tried with similar query below and it works: {noformat} Query: explain select item_sk, rk from ( select ss_item_sk item_sk, ss_sold_time_sk, ss_customer_sk, row_number() over (partition by ss_item_sk order by ss_sold_time_sk) rk from store_sales ) b where rk = 1 +-+ | Explain String | +-+ | Max Per-Host Resource Reservation: Memory=28.00MB Threads=4 | | Per-Host Resource Estimates: Memory=58MB | | Analyzed query: SELECT item_sk, rk FROM (SELECT ss_item_sk item_sk, | | ss_sold_time_sk, ss_customer_sk, row_number() OVER (PARTITION BY ss_item_sk | | ORDER BY ss_sold_time_sk ASC) rk FROM tpcds_parquet.store_sales) b WHERE rk = | | CAST(1 AS BIGINT) | | | | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Instance Resources: mem-estimate=4.20MB mem-reservation=4.00MB thread-reservation=1 | | | max-parallelism=1 segment-costs=[40262] cpu-comparison-result=6 [max(1 (self) vs 6 (sum children))] | | PLAN-ROOT SINK | | | output exprs: ss_item_sk, row_number() | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 cost=35950 | | | | | 06:EXCHANGE [UNPARTITIONED] | | | mem-estimate=201.02KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=5,4 row-size=20B cardinality=17.98K cost=4312 | | | in pipelines: 05(GETNEXT) | | | | | F01:PLAN FRAGMENT [HASH(ss_item_sk)] hosts=3 instances=3 (adjusted from 384) | | Per-Instance Resources: mem-estimate=10.16MB mem-reservation=10.00MB thread-reservation=1 | | max-parallelism=3 segment-costs=[146224, 77623] cpu-comparison-result=6 [max(3 (self) vs 6 (sum children))] | | 03:SELECT | | | predicates: row_number() = CAST(1 AS BIGINT) | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975 | | | in pipelines: 05(GETNEXT) | | | | | 02:ANALYTIC | | | functions: row_number() | | | partition by: ss_item_sk | | | order by: ss_sold_time_sk ASC | | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=5,4 row-size=20B cardinality=17.98K cost=17975 | | | in pipelines: 05(GETNEXT
[jira] [Work started] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()
[ https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-13150 started by Daniel Becker. -- > Possible buffer overflow in StringVal::CopyFrom() > - > > Key: IMPALA-13150 > URL: https://issues.apache.org/jira/browse/IMPALA-13150 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a > {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the > constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is > usually a 32-bit signed integer. The constructor then allocates memory for > the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy > the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and > {{int}} is 32 bits, and the value is truncated, we may copy more bytes that > what we have allocated the destination for. See > https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()
[ https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker updated IMPALA-13150: --- Summary: Possible buffer overflow in StringVal::CopyFrom() (was: Possible buffer overflow in StringVal) > Possible buffer overflow in StringVal::CopyFrom() > - > > Key: IMPALA-13150 > URL: https://issues.apache.org/jira/browse/IMPALA-13150 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a > {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the > constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is > usually a 32-bit signed integer. The constructor then allocates memory for > the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy > the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and > {{int}} is 32 bits, and the value is truncated, we may copy more bytes that > what we have allocated the destination for. See > https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924 ] Quanlong Huang commented on IMPALA-13152: - Assiging this to [~rizaon] who knows more about this. > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853919#comment-17853919 ] ASF subversion and git services commented on IMPALA-12800: -- Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ] IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining lists for correct ordering (ordering needs to match to SlotRef order). Ignores duplicate inserts, preserving the old behavior that only the first match would actually be usable; duplicates primarily show up as a result of combining duplicate distinct and aggregate expressions, or redundant nested aggregation (like the tests for IMPALA-10182). Implements localHash and hashCode for Expr and related classes. Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for lookup and not expected to be mutated. Adds the many expressions test, which now runs in a handful of seconds. Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe Reviewed-on: http://gerrit.cloudera.org:8080/21483 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.wr
[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853921#comment-17853921 ] ASF subversion and git services commented on IMPALA-13151: -- Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ] IMPALA-13151: Use MonotonicNanos to track test time Uses MonotonicNanos to track test time rather than MonotonicStopWatch. IMPALA-2407 updated MonotonicStopWatch to use a low-precision implementation for performance, which on ARM in particular sometimes results in undercounting time by a few microseconds. That's enough to cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos. Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to better match Impala code base. Reproduced on ARM and tested the new implementation for several dozen runs without failure. Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400 Reviewed-on: http://gerrit.cloudera.org:8080/21497 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2407) Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2
[ https://issues.apache.org/jira/browse/IMPALA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853922#comment-17853922 ] ASF subversion and git services commented on IMPALA-2407: - Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ] IMPALA-13151: Use MonotonicNanos to track test time Uses MonotonicNanos to track test time rather than MonotonicStopWatch. IMPALA-2407 updated MonotonicStopWatch to use a low-precision implementation for performance, which on ARM in particular sometimes results in undercounting time by a few microseconds. That's enough to cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos. Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to better match Impala code base. Reproduced on ARM and tested the new implementation for several dozen runs without failure. Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400 Reviewed-on: http://gerrit.cloudera.org:8080/21497 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Nested Types : Remove calls to clock_gettime for a 9x performance improvement > on EC2 > > > Key: IMPALA-2407 > URL: https://issues.apache.org/jira/browse/IMPALA-2407 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Mostafa Mokhtar >Assignee: Jim Apple >Priority: Critical > Labels: ec2, performance, ramp-up > Fix For: Impala 2.5.0 > > Attachments: q12Nested.tar.gz > > > Queries against Nested types show that ~90% of the time is spent in > clock_gettime. > A cheaper accounting method can speed up Nested queries by 8-9x > {code} > select > count(*) > from > customer.orders_string o, > o.lineitems_string l > where > l_shipmode in ('MAIL', 'SHIP') > and l_commitdate < l_receiptdate > and l_shipdate < l_commitdate > and l_receiptdate >= '1994-01-01' > and l_receiptdate < '1995-01-01' > group by > l_shipmode > order by > l_shipmode > {code} > Schema > +---+--+-+ > > > | name | type | comment | > > > +---+--+-+ > > > | c_custkey | bigint | | > > > | c_name| string | | > > > | c_address | string | | > > > | c_nationkey | bigint | | > | c_phone | string | | > | c_acctbal | double | | > | c_mktsegment | string | | > | c_comment | string | | > | orders_string | array | | o_orderkey:bigint, | | > | | o_orderstatus:string, | | > | | o_totalprice:double, | | > | | o_orderdate:string,| | > | | o_orderpriority:string,| | > | | o_clerk:string,| | > | | o_shippriority:bigint, | | > | | o_comment:string, | | > | | lineitems_string:array | | l_partkey:bigint,| | > | | l_suppkey:bigint,| | > | | l_linenumber:bigint, | | > | | l_quantity:double, | | > | | l_extendedprice:double, | | > | | l_discount:double, | | > | | l_tax:double,|
[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
[ https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853920#comment-17853920 ] ASF subversion and git services commented on IMPALA-10182: -- Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ] IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining lists for correct ordering (ordering needs to match to SlotRef order). Ignores duplicate inserts, preserving the old behavior that only the first match would actually be usable; duplicates primarily show up as a result of combining duplicate distinct and aggregate expressions, or redundant nested aggregation (like the tests for IMPALA-10182). Implements localHash and hashCode for Expr and related classes. Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for lookup and not expected to be mutated. Adds the many expressions test, which now runs in a handful of seconds. Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe Reviewed-on: http://gerrit.cloudera.org:8080/21483 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Rows with NULLs filtered out with duplicate columns in subquery select inside > UNION ALL > --- > > Key: IMPALA-10182 > URL: https://issues.apache.org/jira/browse/IMPALA-10182 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Tim Armstrong >Assignee: Aman Sinha >Priority: Blocker > Labels: correctness > Fix For: Impala 4.0.0 > > > Bug report from here - > https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415 > Repro: > {noformat} > create database if not exists as_adventure; > use as_adventure; > CREATE tABLE IF NOT EXISTS > as_adventure.t1 > ( > productsubcategorykey INT, > productline STRING); > insert into t1 values (1,'l1'); > insert into t1 values (2,'l1'); > insert into t1 values (1,'l2'); > insert into t1 values (3,'l3'); > insert into t1 values (null,''); > select * from t1; > SELECT > MIN(t_53.c_41) c_41, > CAST(NULL AS DOUBLE) c_43, > CAST(NULL AS BIGINT) c_44, > t_53.c2 c2, > t_53.c3s0c3s0, > t_53.c4 c4, > t_53.c5s0c5s0 > FROM > ( SELECT > t.productsubcategorykey c_41, > t.productline c2, > t.productline c3s0, > t.productsubcategorykey c4, > t.productsubcategorykey c5s0 > FROM > as_adventure.t1 t > WHERE > true > GROUP BY > 2, > 3, > 4, > 5 ) t_53 > GROUP BY > 4, > 5, > 6, > 7 > > UNION ALL > SELECT > MIN(t_53.c_41) c_41, > CAST(NULL AS DOUBLE) c_43, > CAST(NULL AS BIGINT) c_44, > t_53.c2 c2, > t_53.c3s0c3s0, > t_53.c4 c4, > t_53.c5s0c5s0 > FROM > ( SELECT > t.productsubcategorykey c_41, > t.productline c2, > t.productline c3s0, > t.productsubcategorykey c4, > t.productsubcategorykey c5s0 > FROM > as_adventure.t1 t > WHERE > true > GROUP BY > 2, > 3, > 4, > 5 ) t_53 > GROUP BY > 4, > 5, > 6, > 7 > {noformat} > Somewhat similar to IMPALA-7957 in that the inferred predicates from the > column equivalences get placed in a Select node. It's a bit different in that > the NULLs that are filtered out from the predicates come from the base table. > {noformat} > ++ > | Explain String >| > ++ > | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6 >| > | Per-Host Resource Estimates: Memory=576MB >| > | WARNI
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853923#comment-17853923 ] ASF subversion and git services commented on IMPALA-11871: -- Commit f7e629935b77f412bf74aeebd704af88f03de351 in impala's branch refs/heads/master from halim.kim [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f7e629935 ] IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled Before this patch, Impala checked whether the Impala service user had the WRITE access to the target HDFS table/partition(s) during the analysis of the INSERT and LOAD DATA statements in the legacy catalog mode. The access levels of the corresponding HDFS table and partitions were computed by the catalog server solely based on the HDFS permissions and ACLs when the table and partitions were instantiated. After this patch, we skip loading HDFS permissions and assume the Impala service user has the READ_WRITE permission on all the HDFS paths associated with the target table during query analysis when Ranger is enabled. The assumption could be removed after Impala's implementation of FsPermissionChecker could additionally take Ranger's policies of HDFS into consideration when performing the check. Testing: - Added end-to-end tests to verify Impala's behavior with respect to the INSERT and LOAD DATA statements when Ranger is enabled in the legacy catalog mode. Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Reviewed-on: http://gerrit.cloudera.org:8080/20221 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 im
[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
Quanlong Huang created IMPALA-13152: --- Summary: IllegalStateException in computing processing cost when there are predicates on analytic output columns Key: IMPALA-13152 URL: https://issues.apache.org/jira/browse/IMPALA-13152 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Assignee: Riza Suminto Saw an error in the following query when is on: {code:sql} create table tbl (a int, b int, c int); set COMPUTE_PROCESSING_COST=1; explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! {code} Exception in the logs: {noformat} I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) at org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) at org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed
[ https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843 ] Quanlong Huang commented on IMPALA-13093: - It seems adding this to hdfs-site.xml can also fix the issue: {code:xml} fs.obs.file.visibility.enable true {code} I'll check whether OBS returns the real block size. CC [~michaelsmith] [~eyizoha] > Insert into Huawei OBS table failed > --- > > Key: IMPALA-13093 > URL: https://issues.apache.org/jira/browse/IMPALA-13093 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Insert into a table using Huawei OBS (Object Storage Service) as the storage > will failed by the following error: > {noformat} > Query: insert into test_obs1 values (1, 'abc') > ERROR: Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory {noformat} > Looking into the logs: > {noformat} > I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] > Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory > @ 0xfc6d44 impala::Status::Status() > @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() > @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() > @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() > @ 0x1c46569 impala::HdfsTableSink::Send() > @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() > @ 0x14efca3 impala::FragmentInstanceState::Exec() > @ 0x148dc4c impala::QueryState::ExecFInstance() > @ 0x1b3bab9 impala::Thread::SuperviseThread() > @ 0x1b3cdb1 boost::detail::thread_data<>::run() > @ 0x2474a87 thread_proxy > @ 0x7fe5a562dea5 start_thread > @ 0x7fe5a25ddb0d __clone{noformat} > Note that impalad is started with {{--symbolize_stacktrace=true}} so the > stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
[ https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853842#comment-17853842 ] Riza Suminto commented on IMPALA-13077: --- Looks like this is a bug in terms of calculating lhsNdv and rhsNdv. In current code, if either Ndv or Cardinality of equality expression is unknown (-1), getSemiJoinCardinality will skip that expression. [https://github.com/apache/impala/blob/e7dac008bbafb20e4c7d15d46f2bac9a757f/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L720-L726] If Ndv is unknown, but Cardinality is known, that code should assume Cardinality as Ndv instead. I test that hack and confirmed that it lower the join cardinality through LOG. {code:java} I0610 17:09:25.739796 3972670 JoinNode.java:719] 774dd75ed2b1fc53:c78b86b2] eqJoinConjuncts_.size=1 I0610 17:09:25.739863 3972670 JoinNode.java:755] 774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality calculate selectivity for (ss_sold_date_sk = min(d_date_sk)) as 5.482456140350877E-4 I0610 17:09:25.739918 3972670 JoinNode.java:760] 774dd75ed2b1fc53:c78b86b2] getSemiJoinCardinality has minSelectivity=5.482456140350877E-4 I0610 17:09:25.739933 3972670 JoinNode.java:762] 774dd75ed2b1fc53:c78b86b2] Changed cardinality from 2880404 to 1579 I0610 17:09:25.739966 3972670 JoinNode.java:866] 774dd75ed2b1fc53:c78b86b2] stats Join: cardinality=1579{code} > Equality predicate on partition column and uncorrelated subquery doesn't > reduce the cardinality estimate > > > Key: IMPALA-13077 > URL: https://issues.apache.org/jira/browse/IMPALA-13077 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. > Consider the following query: > {code:sql} > select xxx from part_tbl > where part_key=(select ... from dim_tbl); > {code} > Its query plan is a JoinNode with two ScanNodes. When estimating the > cardinality of the JoinNode, the planner is not aware that 'part_key' is the > partition column and the cardinality of the JoinNode should not be larger > than the max row count across partitions. > The recent work in IMPALA-12018 (Consider runtime filter for cardinality > reduction) helps in some cases since there are runtime filters on the > partition column. But there are still some cases that we overestimate the > cardinality. For instance, 'ss_sold_date_sk' is the only partition key of > tpcds.store_sales. The following query > {code:sql} > select count(*) from tpcds.store_sales > where ss_sold_date_sk=( > select min(d_date_sk) + 1000 from tpcds.date_dim);{code} > has query plan: > {noformat} > +-+ > | Explain String | > +-+ > | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | > | Per-Host Resource Estimates: Memory=243MB | > | | > | PLAN-ROOT SINK | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:AGGREGATE| > | | output: count(*) | > | | row-size=8B cardinality=1| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| > | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | > | | runtime filters: RF000 <- min(d_date_sk) + 1000 | > | | row-size=4B cardinality=2.88M < Should be max(numRows) across > partitions > | | | > | |--07:EXCHANGE [BROADCAST] | > | | || > | | 06:AGGREGATE [FINALIZE] | > | | | output: min:merge(d_date_sk)
[jira] [Updated] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-13151: --- Affects Version/s: Impala 4.5.0 (was: Impala 4.4.0) > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853824#comment-17853824 ] Michael Smith commented on IMPALA-13151: Oh, more likely that MonotonicStopWatch is less precise because https://github.com/apache/impala/blob/4.4.0/be/src/util/stopwatch.h#L159-L163. > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-13151 started by Michael Smith. -- > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853778#comment-17853778 ] Michael Smith commented on IMPALA-13151: I'm tempted to make that a fuzzy comparison. Maybe the sleep method used for debug actions is slightly less precise than the timer. > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell reassigned IMPALA-13151: -- Assignee: Michael Smith > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
Joe McDonnell created IMPALA-13151: -- Summary: DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM Key: IMPALA-13151 URL: https://issues.apache.org/jira/browse/IMPALA-13151 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.4.0 Reporter: Joe McDonnell The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is failing with errors like this: {noformat} /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), actual: 269834 vs 30{noformat} So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock
[ https://issues.apache.org/jira/browse/IMPALA-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated IMPALA-13126: --- Labels: catalog-2024 (was: ) > ReloadEvent.isOlderEvent() should hold the table read lock > -- > > Key: IMPALA-13126 > URL: https://issues.apache.org/jira/browse/IMPALA-13126 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: catalog-2024 > > Saw an exception like this: > {noformat} > E0601 09:11:25.275251 246 MetastoreEventsProcessor.java:990] Unexpected > exception received while processing event > Java exception follows: > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) > at java.util.HashMap$ValueIterator.next(HashMap.java:1498) > at > org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616) > at > org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597) > at > org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511) > at > org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489) > at > org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024) > at > org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754) > at > org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) {noformat} > For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check > whether the corresponding partition is reloaded after the event. This should > be done after holding the table read lock. Otherwise, EventProcessor could > hit the error above when there are concurrent DDLs/DMLs modifying the > partition list. > CC [~VenuReddy] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS
[ https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853729#comment-17853729 ] ASF subversion and git services commented on IMPALA-13146: -- Commit e7dac008bbafb20e4c7d15d46f2bac9a757f in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e7dac008b ] IMPALA-13146: Download NodeJS from native toolchain Some test runs have had issues downloading the NodeJS tarball from the nodejs servers. This changes the test to download from our native toolchain to make this more reliable. This means that future upgrades to NodeJS will need to upload new tarballs to the native toolchain. Testing: - Ran x86_64/ARM javascript tests Change-Id: I1def801469cb68633e89b4a0f3c07a771febe599 Reviewed-on: http://gerrit.cloudera.org:8080/21494 Tested-by: Impala Public Jenkins Reviewed-by: Surya Hebbar Reviewed-by: Wenzhe Zhou > Javascript tests sometimes fail to download NodeJS > -- > > Key: IMPALA-13146 > URL: https://issues.apache.org/jira/browse/IMPALA-13146 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > For automated tests, sometimes the Javascript tests fail to download NodeJS: > {noformat} > 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ... > 01:37:16 % Total% Received % Xferd Average Speed TimeTime > Time Current > 01:37:16 Dload Upload Total Spent > Left Speed > 01:37:16 > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:01 --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0 > 0 21.5M0 9020 0293 0 21:23:04 0:00:03 21:23:01 293 > ... > 30 21.5M 30 6776k 0 0 50307 0 0:07:28 0:02:17 0:05:11 23826 > 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to > read{noformat} > If this keeps happening, we should mirror the NodeJS binary on the > native-toolchain s3 bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13150) Possible buffer overflow in StringVal
Daniel Becker created IMPALA-13150: -- Summary: Possible buffer overflow in StringVal Key: IMPALA-13150 URL: https://issues.apache.org/jira/browse/IMPALA-13150 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is usually a 32-bit signed integer. The constructor then allocates memory for the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and {{int}} is 32 bits, and the value is truncated, we may copy more bytes that what we have allocated the destination for. See https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI
Quanlong Huang created IMPALA-13149: --- Summary: Show JVM info in the WebUI Key: IMPALA-13149 URL: https://issues.apache.org/jira/browse/IMPALA-13149 Project: IMPALA Issue Type: New Feature Reporter: Quanlong Huang It'd be helpful to show the JVM info in the WebUI, e.g. show the output of "java -version": {code:java} openjdk version "1.8.0_412" OpenJDK Runtime Environment (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code} On nodes just have JRE deployed, we'd like to deploy the same version of JDK to perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations
[ https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13148: Attachment: Selection_123.png Selection_122.png > Show the number of in-progress Catalog operations > - > > Key: IMPALA-13148 > URL: https://issues.apache.org/jira/browse/IMPALA-13148 > Project: IMPALA > Issue Type: Improvement >Reporter: Quanlong Huang >Priority: Major > Labels: newbie, ramp-up > Attachments: Selection_122.png, Selection_123.png > > > In the /operations page of catalogd WebUI, the list of In-progress Catalog > Operations are shown. It'd be helpful to also show the number of such > operations. Like in the /queries page of coordinator WebUI, it shows 100 > queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations
Quanlong Huang created IMPALA-13148: --- Summary: Show the number of in-progress Catalog operations Key: IMPALA-13148 URL: https://issues.apache.org/jira/browse/IMPALA-13148 Project: IMPALA Issue Type: Improvement Reporter: Quanlong Huang Attachments: Selection_122.png, Selection_123.png In the /operations page of catalogd WebUI, the list of In-progress Catalog Operations are shown. It'd be helpful to also show the number of such operations. Like in the /queries page of coordinator WebUI, it shows 100 queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure
[ https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou resolved IMPALA-13143. -- Fix Version/s: Impala 4.5.0 Resolution: Fixed > TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query > failure > - > > Key: IMPALA-13143 > URL: https://issues.apache.org/jira/browse/IMPALA-13143 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Wenzhe Zhou >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.5.0 > > > The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing > intermittently with: > {noformat} > custom_cluster/test_catalogd_ha.py:472: in > test_catalogd_failover_with_sync_ddl > self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client) > common/impala_test_suite.py:1216: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1234: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of > the expected states [5], last known state 4{noformat} > This means the query succeeded even though we expected it to fail. This is > currently limited to s3 jobs. In a different test, we saw issues because s3 > is slower (see IMPALA-12616). > This test was introduced by IMPALA-13134: > https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853512#comment-17853512 ] Fang-Yu Rao commented on IMPALA-12266: -- Encountered this failure again at [https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/1873/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_convert_table_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/] in a Jenkins job against [https://gerrit.cloudera.org/c/21160/], which did not change Impala's behavior in this area. > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Critical > Labels: impala-iceberg > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail
[jira] [Commented] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure
[ https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853328#comment-17853328 ] ASF subversion and git services commented on IMPALA-13143: -- Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ] IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl The test_catalogd_failover_with_sync_ddl test which was added to custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3. The test relies on specific timing with a sleep injected via a debug action so that the DDL query is still running when catalogd failover is triggered. The failures were caused by slowly restarting for catalogd on s3 so that the query finished before catalogd failover was triggered. This patch fixed the issue by increasing the sleep time for s3 builds and other slow builds. Testing: - Ran the test 100 times in a loop on s3. Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5 Reviewed-on: http://gerrit.cloudera.org:8080/21491 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query > failure > - > > Key: IMPALA-13143 > URL: https://issues.apache.org/jira/browse/IMPALA-13143 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Wenzhe Zhou >Priority: Critical > Labels: broken-build, flaky > > The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing > intermittently with: > {noformat} > custom_cluster/test_catalogd_ha.py:472: in > test_catalogd_failover_with_sync_ddl > self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client) > common/impala_test_suite.py:1216: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1234: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of > the expected states [5], last known state 4{noformat} > This means the query succeeded even though we expected it to fail. This is > currently limited to s3 jobs. In a different test, we saw issues because s3 > is slower (see IMPALA-12616). > This test was introduced by IMPALA-13134: > https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
[ https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853329#comment-17853329 ] ASF subversion and git services commented on IMPALA-13134: -- Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ] IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl The test_catalogd_failover_with_sync_ddl test which was added to custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3. The test relies on specific timing with a sleep injected via a debug action so that the DDL query is still running when catalogd failover is triggered. The failures were caused by slowly restarting for catalogd on s3 so that the query finished before catalogd failover was triggered. This patch fixed the issue by increasing the sleep time for s3 builds and other slow builds. Testing: - Ran the test 100 times in a loop on s3. Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5 Reviewed-on: http://gerrit.cloudera.org:8080/21491 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status > - > > Key: IMPALA-13134 > URL: https://issues.apache.org/jira/browse/IMPALA-13134 > Project: IMPALA > Issue Type: Bug > Components: Backend, Catalog >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL > enabled. If the status of Catalogd is changed from active to standby when > CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd > does not receive catalog topic updates from statestore. This causes catalogd > thread waits indefinitely and DDL query hanging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS
[ https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell reassigned IMPALA-13146: -- Assignee: Joe McDonnell > Javascript tests sometimes fail to download NodeJS > -- > > Key: IMPALA-13146 > URL: https://issues.apache.org/jira/browse/IMPALA-13146 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > For automated tests, sometimes the Javascript tests fail to download NodeJS: > {noformat} > 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ... > 01:37:16 % Total% Received % Xferd Average Speed TimeTime > Time Current > 01:37:16 Dload Upload Total Spent > Left Speed > 01:37:16 > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:01 --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0 > 0 21.5M0 9020 0293 0 21:23:04 0:00:03 21:23:01 293 > ... > 30 21.5M 30 6776k 0 0 50307 0 0:07:28 0:02:17 0:05:11 23826 > 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to > read{noformat} > If this keeps happening, we should mirror the NodeJS binary on the > native-toolchain s3 bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13147) Add support for limiting the concurrency of link jobs
Joe McDonnell created IMPALA-13147: -- Summary: Add support for limiting the concurrency of link jobs Key: IMPALA-13147 URL: https://issues.apache.org/jira/browse/IMPALA-13147 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Link jobs can use a lot of memory due to the amount of debug info. The level of concurrency that is useful for compilation can be too high for linking. Running a link-heavy command like buildall.sh -skiptests can run out of memory from linking all of the backend tests / benchmarks. It would be useful to be able to limit the number of concurrent link jobs. There are two basic approaches: When using the ninja generator for CMake, ninja supports having job pools with limited parallelism. CMake has support for mapping link tasks to their own pool. Here is an example: {noformat} set(CMAKE_JOB_POOLS compilation_pool=24 link_pool=8) set(CMAKE_JOB_POOL_COMPILE compilation_pool) set(CMAKE_JOB_POOL_LINK link_pool){noformat} The makefile generator does not have equivalent functionality, but we could do a more limited version where buildall.sh can split the -skiptests into two make invocations. The first does all the compilation with full parallelism (equivalent to -notests) and then the second make invocation does the backend tests / benchmarks with a reduced parallelism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax
[ https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853235#comment-17853235 ] ASF subversion and git services commented on IMPALA-13096: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Cleanup Parser.jj for Calcite planner to only use supported syntax > -- > > Key: IMPALA-13096 > URL: https://issues.apache.org/jira/browse/IMPALA-13096 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13095) Handle UDFs in Calcite planner
[ https://issues.apache.org/jira/browse/IMPALA-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853236#comment-17853236 ] ASF subversion and git services commented on IMPALA-13095: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Handle UDFs in Calcite planner > -- > > Key: IMPALA-13095 > URL: https://issues.apache.org/jira/browse/IMPALA-13095 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12935) Allow function parsing for Impala Calcite planner
[ https://issues.apache.org/jira/browse/IMPALA-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853234#comment-17853234 ] ASF subversion and git services commented on IMPALA-12935: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Allow function parsing for Impala Calcite planner > - > > Key: IMPALA-12935 > URL: https://issues.apache.org/jira/browse/IMPALA-12935 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > > We need the ability to parse and validate Impala functions using the Calcite > planner > This commit is not attended to work for all functions, or even most > functions. It will work as a base to be reviewed, and at least some > functions will work. More complicated functions will be added in a later > commit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS
Joe McDonnell created IMPALA-13146: -- Summary: Javascript tests sometimes fail to download NodeJS Key: IMPALA-13146 URL: https://issues.apache.org/jira/browse/IMPALA-13146 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell For automated tests, sometimes the Javascript tests fail to download NodeJS: {noformat} 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ... 01:37:16 % Total% Received % Xferd Average Speed TimeTime Time Current 01:37:16 Dload Upload Total SpentLeft Speed 01:37:16 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0 0 21.5M0 9020 0293 0 21:23:04 0:00:03 21:23:01 293 ... 30 21.5M 30 6776k 0 0 50307 0 0:07:28 0:02:17 0:05:11 23826 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to read{noformat} If this keeps happening, we should mirror the NodeJS binary on the native-toolchain s3 bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations
[ https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13130. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Under heavy load, Impala does not prioritize data stream operations > --- > > Key: IMPALA-13130 > URL: https://issues.apache.org/jira/browse/IMPALA-13130 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.5.0 > > > Under heavy load - where Impala reaches max memory for the DataStreamService > and applies backpressure via > https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199 > - DataStreamService does not differentiate between types of requests and may > reject requests that could help reduce load. > The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, > UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize > completing EndDataStream, especially under heavy load, to complete work and > release resources more quickly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure
[ https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-13143: Assignee: Wenzhe Zhou > TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query > failure > - > > Key: IMPALA-13143 > URL: https://issues.apache.org/jira/browse/IMPALA-13143 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Wenzhe Zhou >Priority: Critical > Labels: broken-build, flaky > > The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing > intermittently with: > {noformat} > custom_cluster/test_catalogd_ha.py:472: in > test_catalogd_failover_with_sync_ddl > self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client) > common/impala_test_suite.py:1216: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1234: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of > the expected states [5], last known state 4{noformat} > This means the query succeeded even though we expected it to fail. This is > currently limited to s3 jobs. In a different test, we saw issues because s3 > is slower (see IMPALA-12616). > This test was introduced by IMPALA-13134: > https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13145) Upgrade mold linker to 2.31.0
Joe McDonnell created IMPALA-13145: -- Summary: Upgrade mold linker to 2.31.0 Key: IMPALA-13145 URL: https://issues.apache.org/jira/browse/IMPALA-13145 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Mold 2.31.0 claims performance improvements and a reduction in the memory needed for linking. See [https://github.com/rui314/mold/releases/tag/v2.31.0] and [https://github.com/rui314/mold/commit/53ebcd80d888778cde16952270f73343f090f342] We should move to that version as some developers are seeing issues with high memory usage for linking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12967) Testcase fails at test_migrated_table_field_id_resolution due to "Table does not exist"
[ https://issues.apache.org/jira/browse/IMPALA-12967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853224#comment-17853224 ] Joe McDonnell commented on IMPALA-12967: There is a separate symptom where this test fails with a Disk I/O error. It is probably somewhat related, so we need to decide whether to include that symptom here. See IMPALA-13144. > Testcase fails at test_migrated_table_field_id_resolution due to "Table does > not exist" > --- > > Key: IMPALA-12967 > URL: https://issues.apache.org/jira/browse/IMPALA-12967 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build > > Testcase test_migrated_table_field_id_resolution fails at exhaustive release > build with following messages: > *Regression* > {code:java} > query_test.test_iceberg.TestIcebergTable.test_migrated_table_field_id_resolution[protocol: > beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] (from pytest) > {code} > *Error Message* > {code:java} > query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution > "iceberg_migrated_alter_test_orc", "orc") common/file_utils.py:68: in > create_iceberg_table_from_directory file_format)) > common/impala_connection.py:215: in execute > fetch_profile_after_close=fetch_profile_after_close) > beeswax/impala_beeswax.py:191: in execute handle = > self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:384: in __execute_query > self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in > wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + > error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: E > Query aborted:ImpalaRuntimeException: Error making 'createTable' RPC to Hive > Metastore: E CAUSED BY: IcebergTableLoadingException: Table does not exist > at location: > hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc > Stacktrace > query_test/test_iceberg.py:266: in test_migrated_table_field_id_resolution > "iceberg_migrated_alter_test_orc", "orc") > common/file_utils.py:68: in create_iceberg_table_from_directory > file_format)) > common/impala_connection.py:215: in execute > fetch_profile_after_close=fetch_profile_after_close) > beeswax/impala_beeswax.py:191: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:384: in __execute_query > self.wait_for_finished(handle) > beeswax/impala_beeswax.py:405: in wait_for_finished > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:ImpalaRuntimeException: Error making 'createTable' RPC to > Hive Metastore: > E CAUSED BY: IcebergTableLoadingException: Table does not exist at > location: > hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test_orc > {code} > *Standard Error* > {code:java} > SET > client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_; > SET sync_ddl=False; > -- executing against localhost:21000 > DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_b59d79db` > CASCADE; > -- 2024-04-02 00:56:55,137 INFO MainThread: Started query > f34399a8b7cddd67:031a3b96 > SET > client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_; > SET sync_ddl=False; > -- executing against localhost:21000 > CREATE DATABASE `test_migrated_table_field_id_resolution_b59d79db`; > -- 2024-04-02 00:56:57,302 INFO MainThread: Started query > 94465af69907eac5:e33f17e0 > -- 2024-04-02 00:56:57,353 INFO MainThread: Created database > "test_migrated_table_field_id_resolution_b59d79db" for test ID > "query_test/test_iceber
[jira] [Commented] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error
[ https://issues.apache.org/jira/browse/IMPALA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853223#comment-17853223 ] Joe McDonnell commented on IMPALA-13144: We need to decide whether we want to track this with IMPALA-12967 (which was originally about "Table does not exist at location" on the same test) or keep it separate. > TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O > error > -- > > Key: IMPALA-13144 > URL: https://issues.apache.org/jira/browse/IMPALA-13144 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > A couple test jobs hit a failure on > TestIcebergTable.test_migrated_table_field_id_resolution: > {noformat} > query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution > vector, unique_database) > common/impala_test_suite.py:725: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:660: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:1013: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:216: in execute > fetch_profile_after_close=fetch_profile_after_close) > beeswax/impala_beeswax.py:191: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:384: in __execute_query > self.wait_for_finished(handle) > beeswax/impala_beeswax.py:405: in wait_for_finished > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:Disk I/O error on > impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to > open HDFS file > hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0 > E Error(2): No such file or directory > E Root cause: RemoteException: File does not exist: > /test-warehouse/iceberg_migrated_alter_test/00_0 > E at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) > E at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) > E at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) > E at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040) > E at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) > E at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) > E at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > E at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) > E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994) > E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922) > E at java.security.AccessController.doPrivileged(Native Method) > E at javax.security.auth.Subject.doAs(Subject.java:422) > E at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13144) TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error
Joe McDonnell created IMPALA-13144: -- Summary: TestIcebergTable.test_migrated_table_field_id_resolution fails with Disk I/O error Key: IMPALA-13144 URL: https://issues.apache.org/jira/browse/IMPALA-13144 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell A couple test jobs hit a failure on TestIcebergTable.test_migrated_table_field_id_resolution: {noformat} query_test/test_iceberg.py:270: in test_migrated_table_field_id_resolution vector, unique_database) common/impala_test_suite.py:725: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:660: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:1013: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:216: in execute fetch_profile_after_close=fetch_profile_after_close) beeswax/impala_beeswax.py:191: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:384: in __execute_query self.wait_for_finished(handle) beeswax/impala_beeswax.py:405: in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery aborted:Disk I/O error on impala-ec2-centos79-m6i-4xlarge-xldisk-153e.vpc.cloudera.com:27000: Failed to open HDFS file hdfs://localhost:20500/test-warehouse/iceberg_migrated_alter_test/00_0 E Error(2): No such file or directory E Root cause: RemoteException: File does not exist: /test-warehouse/iceberg_migrated_alter_test/00_0 E at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) E at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) E at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) E at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040) E at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) E at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) E at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) E at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994) E at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922) E at java.security.AccessController.doPrivileged(Native Method) E at javax.security.auth.Subject.doAs(Subject.java:422) E at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure
Joe McDonnell created IMPALA-13143: -- Summary: TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure Key: IMPALA-13143 URL: https://issues.apache.org/jira/browse/IMPALA-13143 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing intermittently with: {noformat} custom_cluster/test_catalogd_ha.py:472: in test_catalogd_failover_with_sync_ddl self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client) common/impala_test_suite.py:1216: in wait_for_state self.wait_for_any_state(handle, [expected_state], timeout, client) common/impala_test_suite.py:1234: in wait_for_any_state raise Timeout(timeout_msg) E Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of the expected states [5], last known state 4{noformat} This means the query succeeded even though we expected it to fail. This is currently limited to s3 jobs. In a different test, we saw issues because s3 is slower (see IMPALA-12616). This test was introduced by IMPALA-13134: https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states
[ https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-12616. Fix Version/s: Impala 4.5.0 Resolution: Fixed I think the s3 slowness version of this is fixed, so I'm going to resolve this. > test_restart_catalogd_while_handling_rpc_response* tests fail not reaching > expected states > -- > > Key: IMPALA-12616 > URL: https://issues.apache.org/jira/browse/IMPALA-12616 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 1.4.2 >Reporter: Andrew Sherman >Assignee: Daniel Becker >Priority: Critical > Fix For: Impala 4.5.0 > > > There are failures in both > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout > and > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters, > both look the same: > {code:java} > custom_cluster/test_restart_services.py:232: in > test_restart_catalogd_while_handling_rpc_response_with_timeout > self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], > max_wait_time) > common/impala_test_suite.py:1181: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1199: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of > the expected states [4], last known state 5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone
[ https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853203#comment-17853203 ] Csaba Ringhofer commented on IMPALA-12322: -- Thanks for the feedback[~eyizoha]. I have uploaded a patch that adds a new query option: https://gerrit.cloudera.org/#/c/21492/ > return wrong timestamp when scan kudu timestamp with timezone > - > > Key: IMPALA-12322 > URL: https://issues.apache.org/jira/browse/IMPALA-12322 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.1.1 > Environment: impala 4.1.1 >Reporter: daicheng >Assignee: Zihao Ye >Priority: Major > Attachments: image-2022-04-24-00-01-05-746-1.png, > image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, > image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, > image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, > image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, > image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, > image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, > image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, > image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, > image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png > > > impala version is 3.1.0-cdh6.1 > i have set system timezone=Asia/Shanghai: > !image-2022-04-24-00-01-37-520.png! > !image-2022-04-24-00-01-05-746.png! > here is the bug: > *step 1* > i have parquet file with two columns like below,and read it with impala-shell > and spark (timezone=shanghai) > !image-2022-04-24-00-03-14-467.png|width=1016,height=154! > !image-2022-04-24-00-04-16-240.png|width=944,height=367! > the result both exactly right。 > *step two* > create kudu table with impala-shell: > CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t > TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU; > note: kudu version:1.8 > and insert 2 row into the table with spark : > !image-2022-04-24-00-04-52-860.png|width=914,height=279! > *stop 3* > read it with spark (timezone=shanghai),spark read kudu table with kudu-client > api,here is the result: > !image-2022-04-24-00-05-52-086.png|width=914,height=301! > the result is still exactly right。 > but read it with impala-shell: > !image-2022-04-24-00-07-09-776.png|width=915,height=154! > the result show late 8hour > *conclusion* > it seems like impala timezone didn't work when kudu column type is > timestamp, but it work fine in parquet file,I don't know why? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states
[ https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853196#comment-17853196 ] ASF subversion and git services commented on IMPALA-12616: -- Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ] IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3 The test_restart_catalogd_while_handling_rpc_response* tests from custom_cluster/test_restart_services.py have been failing consistently on s3. The alter table statement is expected to succeed, but instead it fails with: "CatalogException: Detected catalog service ID changes" This manifests as a timeout waiting for the statement to reach the finished state. The test relies on specific timing with a sleep injected via a debug action. The failure stems from the catalog being slower on s3. The alter table wakes up before the catalog service ID change has fully completed, and it fails when it sees the catalog service ID change. This increases two sleep times: 1. This increases the sleep time before restarting the catalogd from 0.5 seconds to 5 seconds. This gives the catalogd longer to receive the message about the alter table and respond back to the impalad. 2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE sleep from 10 seconds to 30 seconds so the alter table statement doesn't wake up until the catalog service ID change is finalized. The test is verifying that the right messages are in the impalad logs, so we know this is still testing the same condition. This modifies the tests to use wait_for_finished_timeout() rather than wait_for_state(). This bails out immediately if the query fails rather than waiting unnecessarily for the full timeout. This also clears the query options so that later statements don't inherit the debug_action that the alter table statement used. Testing: - Ran the tests 100x in a loop on s3 - Ran the tests 100x in a loop on HDFS Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29 Reviewed-on: http://gerrit.cloudera.org:8080/21485 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker > test_restart_catalogd_while_handling_rpc_response* tests fail not reaching > expected states > -- > > Key: IMPALA-12616 > URL: https://issues.apache.org/jira/browse/IMPALA-12616 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 1.4.2 >Reporter: Andrew Sherman >Assignee: Daniel Becker >Priority: Critical > > There are failures in both > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout > and > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters, > both look the same: > {code:java} > custom_cluster/test_restart_services.py:232: in > test_restart_catalogd_while_handling_rpc_response_with_timeout > self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], > max_wait_time) > common/impala_test_suite.py:1181: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1199: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of > the expected states [4], last known state 5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13142) Documentation for Impala StateStore HA
Sanjana Malhotra created IMPALA-13142: - Summary: Documentation for Impala StateStore HA Key: IMPALA-13142 URL: https://issues.apache.org/jira/browse/IMPALA-13142 Project: IMPALA Issue Type: Documentation Reporter: Sanjana Malhotra Assignee: Sanjana Malhotra IMPALA-12156 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13137) Add additional client fetch metrics columns to the queries page
[ https://issues.apache.org/jira/browse/IMPALA-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853096#comment-17853096 ] Surya Hebbar commented on IMPALA-13137: --- It was confirmed in the meeting, that the expected column was the {{{}ClientFetchWaitTimer{}}}'s value and not the difference between "First row fetched" and "Last row fetched". > Add additional client fetch metrics columns to the queries page > --- > > Key: IMPALA-13137 > URL: https://issues.apache.org/jira/browse/IMPALA-13137 > Project: IMPALA > Issue Type: New Feature > Components: Backend, be >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Attachments: completed_query.png, in_flight_query_1.png, > in_flight_query_2.png, in_flight_query_3.png, very_short_fetch_timer.png > > > For helping users to better understand query execution times, it would be > helpful to add the following columns on the queries page. > * First row fetched time - Time taken for the client to fetch the first row > * Client fetch wait time - Time taken for the client to fetch all rows > Additional details - > https://jira.cloudera.com/browse/DWX-18295 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled
[ https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venugopal Reddy K updated IMPALA-13141: --- Description: Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled. *Observations:* 1. In case of AlterPartitionEvent, this issue occurs when hms_event_incremental_refresh_transactional_table is disabled. 2. In case of BatchPartitionEvent(when. more than 1 AlterPartitionEvent are batched together), this issue occurs without disabling hms_event_incremental_refresh_transactional_table. *Steps to reproduce:* 1. Create partitioned table and add some partitions from hive: Note: This step can be from impala too. {code:java} 0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int); 0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200); {code} {code:java} 0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned by(p int) tblproperties ('transactional'='true', 'transactional_properties'='insert_only'); 0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict; 0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s; 0: jdbc:hive2://localhost:11050> show partitions test; ++ | partition | ++ | p=100 | | p=200 | ++ 0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100); +---++---+ | col_name | data_type | comment | +---++---+ | i | int | | | j | int | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | p | int | | | | NULL | NULL | | # Detailed Partition Information | NULL | NULL | | Partition Value: | [100] | NULL | | Database: | default | NULL | | Table: | test | NULL | | CreateTime: | Fri Jun 07 14:21:17 IST 2024 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Location: | hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL | | Partition Parameters: | NULL | NULL | | | numFiles | 1 | | | totalSize | 5 | | | transient_lastDdlTime | 1717750277 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL
[jira] [Updated] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled
[ https://issues.apache.org/jira/browse/IMPALA-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venugopal Reddy K updated IMPALA-13141: --- Description: Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled. *Observations:* 1. In case of AlterPartitionEvent, this issue occurs when hms_event_incremental_refresh_transactional_table is disabled. 2. In case of BatchPartitionEvent(when. more than 1 AlterPartitionEvent are batched together), this issue occurs without disabling hms_event_incremental_refresh_transactional_table. *Steps to reproduce:* 1. Create partitioned table and add some partitions from hive: Note: This step can be from impala too. {code:java} 0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int); 0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200); {code} {code:java} 0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned by(p int) tblproperties ('transactional'='true', 'transactional_properties'='insert_only'); 0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict; 0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s; 0: jdbc:hive2://localhost:11050> show partitions test; ++ | partition | ++ | p=100 | | p=200 | ++ 0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100); +---++---+ | col_name | data_type | comment | +---++---+ | i | int | | | j | int | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | p | int | | | | NULL | NULL | | # Detailed Partition Information | NULL | NULL | | Partition Value: | [100] | NULL | | Database: | default | NULL | | Table: | test | NULL | | CreateTime: | Fri Jun 07 14:21:17 IST 2024 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Location: | hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL | | Partition Parameters: | NULL | NULL | | | numFiles | 1 | | | totalSize | 5 | | | transient_lastDdlTime | 1717750277 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL
[jira] [Created] (IMPALA-13141) Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled
Venugopal Reddy K created IMPALA-13141: -- Summary: Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled Key: IMPALA-13141 URL: https://issues.apache.org/jira/browse/IMPALA-13141 Project: IMPALA Issue Type: Bug Reporter: Venugopal Reddy K Partition transactional table is not updated on alter partition when hms_event_incremental_refresh_transactional_table is disabled. *Observations:* 1. In case of AlterPartitionEvent, this issue occurs when hms_event_incremental_refresh_transactional_table is disabled. 2. In case of BatchPartitionEvent(when. more than 1 AlterPartitionEvent are batched together), this issue occurs without disabling hms_event_incremental_refresh_transactional_table. *Steps to reproduce:* 1. Create partitioned table and add some partitions from hive: Note: This step can be from impala too. {code:java} 0: jdbc:hive2://localhost:11050> create table s(i int, j int, p int); 0: jdbc:hive2://localhost:11050> insert into s values(1,10,100),(2,20,200); {code} {code:java} 0: jdbc:hive2://localhost:11050> create table test1(i int, j int) partitioned by(p int) tblproperties ('transactional'='true', 'transactional_properties'='insert_only'); 0: jdbc:hive2://localhost:11050> set hive.exec.dynamic.partition.mode=nonstrict; 0: jdbc:hive2://localhost:11050> insert into test partition(p) select * from s; 0: jdbc:hive2://localhost:11050> show partitions test; ++ | partition | ++ | p=100 | | p=200 | ++ 0: jdbc:hive2://localhost:11050> desc formatted test partition(p=100); +---++---+ | col_name | data_type | comment | +---++---+ | i | int | | | j | int | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | p | int | | | | NULL | NULL | | # Detailed Partition Information | NULL | NULL | | Partition Value: | [100] | NULL | | Database: | default | NULL | | Table: | test | NULL | | CreateTime: | Fri Jun 07 14:21:17 IST 2024 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Location: | hdfs://localhost:20500/test-warehouse/managed/test/p=100 | NULL | | Partition Parameters: | NULL | NULL | | | numFiles | 1 | | | totalSize | 5 | | | transient_lastDdlTime | 1717750277 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets:
[jira] [Assigned] (IMPALA-13140) Add backend flag to disable small string optimization
[ https://issues.apache.org/jira/browse/IMPALA-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-13140: -- Assignee: Zoltán Borók-Nagy > Add backend flag to disable small string optimization > - > > Key: IMPALA-13140 > URL: https://issues.apache.org/jira/browse/IMPALA-13140 > Project: IMPALA > Issue Type: Sub-task >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Critical > > We could have a backend flag that would make SmallableString::Smallify() a > no-op. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13140) Add backend flag to disable small string optimization
Zoltán Borók-Nagy created IMPALA-13140: -- Summary: Add backend flag to disable small string optimization Key: IMPALA-13140 URL: https://issues.apache.org/jira/browse/IMPALA-13140 Project: IMPALA Issue Type: Sub-task Reporter: Zoltán Borók-Nagy We could have a backend flag that would make SmallableString::Smallify() a no-op. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations
[ https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853074#comment-17853074 ] ASF subversion and git services commented on IMPALA-13130: -- Commit 3f827bfc2447d8c11a4f09bcb96e86c53b92d753 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f827bfc2 ] IMPALA-13130: Prioritize EndDataStream messages Prioritize EndDataStream messages over other types handled by DataStreamService, and avoid rejecting them when memory limit is reached. They take very little memory (~75 bytes) and will usually help reduce memory use by closing out in-progress operations. Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS timeouts. Defaults to 1 hour, and can be disabled by setting to -1. Adds unit tests ensuring EOS are processed even if mem limit is reached and ahead of TransmitData messages in the queue. Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8 Reviewed-on: http://gerrit.cloudera.org:8080/21476 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Under heavy load, Impala does not prioritize data stream operations > --- > > Key: IMPALA-13130 > URL: https://issues.apache.org/jira/browse/IMPALA-13130 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > Under heavy load - where Impala reaches max memory for the DataStreamService > and applies backpressure via > https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199 > - DataStreamService does not differentiate between types of requests and may > reject requests that could help reduce load. > The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, > UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize > completing EndDataStream, especially under heavy load, to complete work and > release resources more quickly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12569) Harden long string testing
[ https://issues.apache.org/jira/browse/IMPALA-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-12569: --- Priority: Critical (was: Major) > Harden long string testing > -- > > Key: IMPALA-12569 > URL: https://issues.apache.org/jira/browse/IMPALA-12569 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Infrastructure >Reporter: Zoltán Borók-Nagy >Priority: Critical > > With small string optimization [~csringhofer] pointed out that most of our > test data have small strings. And new features are typically tested on the > existing test tables (e.g. alltypes that only have small strings), or they > add new tests with usually small strings only. The latter is hard to prevent. > Therefore the long strings might have less test coverage if we don't pay > enough attention. > To make the situation better, we could > # Add long string data to the string column of alltypes table and > complextypestbl and update the tests > # Add backend flag the makes StringValue.Smallify() a no-op, and create a > test job (probably with an ASAN build) that runs the tests with that flag > turned on. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
[ https://issues.apache.org/jira/browse/IMPALA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell updated IMPALA-13139: --- Description: When debugging TestRestart, I noticed that the debug_action set for one query stayed in effect for subsequent queries that didn't specify query_options. {noformat} DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}" .format(debug_action_sleep_time_sec * 1000)) query = "alter table {} add columns (age int)".format(tbl_name) handle = self.execute_query_async(query, query_options={"debug_action": DEBUG_ACTION}) ... # debug_action is still set for these queries: self.execute_query_expect_success(self.client, "select age from {}".format(tbl_name)) self.execute_query_expect_success(self.client, "alter table {} add columns (name string)".format(tbl_name)) self.execute_query_expect_success(self.client, "select name from {}".format(tbl_name)){noformat} There is a way to clear the query options (self.client.clear_configuration()), but this is an odd behavior. It's unclear if some tests rely on this behavior. > Query options set via ImpalaTestSuite::execute_query_expect_success stay set > for subsequent queries > --- > > Key: IMPALA-13139 > URL: https://issues.apache.org/jira/browse/IMPALA-13139 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Priority: Major > > When debugging TestRestart, I noticed that the debug_action set for one query > stayed in effect for subsequent queries that didn't specify query_options. > {noformat} > DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}" > .format(debug_action_sleep_time_sec * 1000)) > query = "alter table {} add columns (age int)".format(tbl_name) > handle = self.execute_query_async(query, query_options={"debug_action": > DEBUG_ACTION}) > ... > # debug_action is still set for these queries: > self.execute_query_expect_success(self.client, "select age from > {}".format(tbl_name)) > self.execute_query_expect_success(self.client, > "alter table {} add columns (name string)".format(tbl_name)) > self.execute_query_expect_success(self.client, "select name from > {}".format(tbl_name)){noformat} > There is a way to clear the query options > (self.client.clear_configuration()), but this is an odd behavior. It's > unclear if some tests rely on this behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13139) Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries
Joe McDonnell created IMPALA-13139: -- Summary: Query options set via ImpalaTestSuite::execute_query_expect_success stay set for subsequent queries Key: IMPALA-13139 URL: https://issues.apache.org/jira/browse/IMPALA-13139 Project: IMPALA Issue Type: Task Components: Infrastructure Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states
[ https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852961#comment-17852961 ] Joe McDonnell commented on IMPALA-12616: This is looking timing-related. I was able to get this to pass by adjusting some of the sleep times. Basically, it looks like the catalog is slower on s3 and some operations don't finish in the time we thought they would. {noformat} debug_action_sleep_time_sec = 10 (NEW: 30) DEBUG_ACTION = ("WAIT_BEFORE_PROCESSING_CATALOG_UPDATE:SLEEP@{}" .format(debug_action_sleep_time_sec * 1000)) query = "alter table {} add columns (age int)".format(tbl_name) handle = self.execute_query_async(query, query_options={"debug_action": DEBUG_ACTION}) # Wait a bit so the RPC from the catalogd arrives to the coordinator. time.sleep(0.5) (NEW: 5) self.cluster.catalogd.restart() # Wait for the query to finish. max_wait_time = (debug_action_sleep_time_sec + self.WAIT_FOR_CATALOG_UPDATE_TIMEOUT_SEC + 10) self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], max_wait_time){noformat} A successful timeline looks like this: # Submit an alter table that sleeps before processing the catalog update # Sleep a little bit so the catalog knows about the alter table # Restart the catalogd # The catalog sends an update via the statestore. This has the new catalog ID and causes this message: "There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID changes from 9c9f7ff13f0e4f72:a896bee4d52fd37e to da67610b2c304198:a05daf1bc3d6a4b3. Aborting updateCatalog()" # The catalogd sends a full topic update # The alter table wakes up and prints this message: Catalog service ID mismatch. Current ID: da67610b2c304198:a05daf1bc3d6a4b3. ID in response: 9c9f7ff13f0e4f72:a896bee4d52fd37e. Catalogd may have been restarted. Waiting for new catalog update from statestore. # Either it times out or there are too many non-empty updates, and the alter table bails out with "W0506 22:42:10.316627 23066 impala-server.cc:2369] e14b23a22458ab75:6b269414] Ignoring catalog update result of catalog service ID 9c9f7ff13f0e4f72:a896bee4d52fd37e because it does not match with current catalog service ID da67610b2c304198:a05daf1bc3d6a4b3. The current catalog service ID may be stale (this may be caused by the catalogd having been restarted more than once) or newer than the catalog service ID of the update result." If the alter table wakes up from its sleep before #5 happens, the alter table will see the catalog service ID change and fail. To avoid that, we adjust the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE higher. I also lengthened the sleep in #2 to give the initial catalog some extra time to hear about the alter table. The test verifies that the logs contain the expected messages, so this should be a safe modification to the test. > test_restart_catalogd_while_handling_rpc_response* tests fail not reaching > expected states > -- > > Key: IMPALA-12616 > URL: https://issues.apache.org/jira/browse/IMPALA-12616 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 1.4.2 >Reporter: Andrew Sherman >Assignee: Daniel Becker >Priority: Critical > > There are failures in both > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout > and > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters, > both look the same: > {code:java} > custom_cluster/test_restart_services.py:232: in > test_restart_catalogd_while_handling_rpc_response_with_timeout > self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], > max_wait_time) > common/impala_test_suite.py:1181: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1199: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of > the expected states [4], last known state 5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918 ] Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:24 PM: Difference from null slots cache: {code} # With caching Query Compilation: 4s678ms - Metadata of all 1 tables cached: 26.276ms (26.276ms) - Analysis finished: 3s466ms (3s440ms) - Authorization finished (noop): 3s467ms (130.395us) - Value transfer graph computed: 3s486ms (19.860ms) - Single node plan created: 4s402ms (915.149ms) - Runtime filters computed: 4s453ms (51.628ms) - Distributed plan created: 4s486ms (33.064ms) - Planning finished: 4s678ms (191.281ms) # Without caching via 'set use_null_slots_cache=false' Query Compilation: 14s845ms - Metadata of all 1 tables cached: 7.608ms (7.608ms) - Analysis finished: 3s207ms (3s199ms) - Authorization finished (noop): 3s207ms (120.606us) - Value transfer graph computed: 3s221ms (14.231ms) - Single node plan created: 14s610ms (11s389ms) - Runtime filters computed: 14s661ms (51.286ms) - Distributed plan created: 14s662ms (246.301us) - Planning finished: 14s845ms (183.164ms) {code} So speeds up single node planning, adds some overhead to distributed planning. I'll look into disabling it for distributed planning. Update: the time to produce cache logging was actually being lumped into "Distributed plan created", so that extra 30ms is from debug logging in logCacheStats. was (Author: JIRAUSER288956): Difference from null slots cache: {code} # With caching Query Compilation: 4s678ms - Metadata of all 1 tables cached: 26.276ms (26.276ms) - Analysis finished: 3s466ms (3s440ms) - Authorization finished (noop): 3s467ms (130.395us) - Value transfer graph computed: 3s486ms (19.860ms) - Single node plan created: 4s402ms (915.149ms) - Runtime filters computed: 4s453ms (51.628ms) - Distributed plan created: 4s486ms (33.064ms) - Planning finished: 4s678ms (191.281ms) # Without caching via 'set use_null_slots_cache=false' Query Compilation: 14s845ms - Metadata of all 1 tables cached: 7.608ms (7.608ms) - Analysis finished: 3s207ms (3s199ms) - Authorization finished (noop): 3s207ms (120.606us) - Value transfer graph computed: 3s221ms (14.231ms) - Single node plan created: 14s610ms (11s389ms) - Runtime filters computed: 14s661ms (51.286ms) - Distributed plan created: 14s662ms (246.301us) - Planning finished: 14s845ms (183.164ms) {code} So speeds up single node planning, adds some overhead to distributed planning. I'll look into disabling it for distributed planning. Update: the time to produce cache logging was actually being lumped into "Distributed plan created", so that extra 20s is from debug logging in logCacheStats. > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat}
[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918 ] Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:23 PM: Difference from null slots cache: {code} # With caching Query Compilation: 4s678ms - Metadata of all 1 tables cached: 26.276ms (26.276ms) - Analysis finished: 3s466ms (3s440ms) - Authorization finished (noop): 3s467ms (130.395us) - Value transfer graph computed: 3s486ms (19.860ms) - Single node plan created: 4s402ms (915.149ms) - Runtime filters computed: 4s453ms (51.628ms) - Distributed plan created: 4s486ms (33.064ms) - Planning finished: 4s678ms (191.281ms) # Without caching via 'set use_null_slots_cache=false' Query Compilation: 14s845ms - Metadata of all 1 tables cached: 7.608ms (7.608ms) - Analysis finished: 3s207ms (3s199ms) - Authorization finished (noop): 3s207ms (120.606us) - Value transfer graph computed: 3s221ms (14.231ms) - Single node plan created: 14s610ms (11s389ms) - Runtime filters computed: 14s661ms (51.286ms) - Distributed plan created: 14s662ms (246.301us) - Planning finished: 14s845ms (183.164ms) {code} So speeds up single node planning, adds some overhead to distributed planning. I'll look into disabling it for distributed planning. Update: the time to produce cache logging was actually being lumped into "Distributed plan created", so that extra 20s is from debug logging in logCacheStats. was (Author: JIRAUSER288956): Difference from null slots cache: {code} # With caching Query Compilation: 4s678ms - Metadata of all 1 tables cached: 26.276ms (26.276ms) - Analysis finished: 3s466ms (3s440ms) - Authorization finished (noop): 3s467ms (130.395us) - Value transfer graph computed: 3s486ms (19.860ms) - Single node plan created: 4s402ms (915.149ms) - Runtime filters computed: 4s453ms (51.628ms) - Distributed plan created: 4s486ms (33.064ms) - Planning finished: 4s678ms (191.281ms) # Without caching via 'set use_null_slots_cache=false' Query Compilation: 14s845ms - Metadata of all 1 tables cached: 7.608ms (7.608ms) - Analysis finished: 3s207ms (3s199ms) - Authorization finished (noop): 3s207ms (120.606us) - Value transfer graph computed: 3s221ms (14.231ms) - Single node plan created: 14s610ms (11s389ms) - Runtime filters computed: 14s661ms (51.286ms) - Distributed plan created: 14s662ms (246.301us) - Planning finished: 14s845ms (183.164ms) {code} So speeds up single node planning, adds some overhead to distributed planning. I'll look into disabling it for distributed planning. > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSub
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852918#comment-17852918 ] Michael Smith commented on IMPALA-12800: Difference from null slots cache: {code} # With caching Query Compilation: 4s678ms - Metadata of all 1 tables cached: 26.276ms (26.276ms) - Analysis finished: 3s466ms (3s440ms) - Authorization finished (noop): 3s467ms (130.395us) - Value transfer graph computed: 3s486ms (19.860ms) - Single node plan created: 4s402ms (915.149ms) - Runtime filters computed: 4s453ms (51.628ms) - Distributed plan created: 4s486ms (33.064ms) - Planning finished: 4s678ms (191.281ms) # Without caching via 'set use_null_slots_cache=false' Query Compilation: 14s845ms - Metadata of all 1 tables cached: 7.608ms (7.608ms) - Analysis finished: 3s207ms (3s199ms) - Authorization finished (noop): 3s207ms (120.606us) - Value transfer graph computed: 3s221ms (14.231ms) - Single node plan created: 14s610ms (11s389ms) - Runtime filters computed: 14s661ms (51.286ms) - Distributed plan created: 14s662ms (246.301us) - Planning finished: 14s845ms (183.164ms) {code} So speeds up single node planning, adds some overhead to distributed planning. I'll look into disabling it for distributed planning. > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQuery
[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912 ] Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:09 PM: I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves query compilation for the repro from {code} # 1st run Query Compilation: 1m15s - Metadata load started: 75.088ms (75.088ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms) - Analysis finished: 7s504ms (4s367ms) - Authorization finished (noop): 7s505ms (946.982us) - Value transfer graph computed: 7s553ms (47.618ms) - Single node plan created: 1m14s (1m7s) - Runtime filters computed: 1m15s (874.659ms) - Distributed plan created: 1m15s (1.168ms) - Planning finished: 1m15s (284.717ms) # 2nd run Query Compilation: 1m6s - Metadata of all 1 tables cached: 18.799ms (18.799ms) - Analysis finished: 3s299ms (3s280ms) - Authorization finished (noop): 3s299ms (118.618us) - Value transfer graph computed: 3s319ms (19.983ms) - Single node plan created: 1m5s (1m2s) - Runtime filters computed: 1m6s (808.587ms) - Distributed plan created: 1m6s (188.167us) - Planning finished: 1m6s (189.985ms) {code} to {code} # 1st run Query Compilation: 8s649ms - Metadata load started: 62.291ms (62.291ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms) - Analysis finished: 7s021ms (4s002ms) - Authorization finished (noop): 7s021ms (569.098us) - Value transfer graph computed: 7s070ms (48.329ms) - Single node plan created: 8s194ms (1s124ms) - Runtime filters computed: 8s261ms (67.366ms) - Distributed plan created: 8s365ms (103.186ms) - Planning finished: 8s649ms (284.506ms) # 2nd run Query Compilation: 4s621ms - Metadata of all 1 tables cached: 17.932ms (17.932ms) - Analysis finished: 3s391ms (3s373ms) - Authorization finished (noop): 3s391ms (133.671us) - Value transfer graph computed: 3s412ms (20.547ms) - Single node plan created: 4s347ms (935.582ms) - Runtime filters computed: 4s399ms (51.706ms) - Distributed plan created: 4s434ms (35.380ms) - Planning finished: 4s621ms (187.070ms) {code} Single node plan creation improves from over 1 minute to ~1 second. There may be some increase from hashing Exprs that makes distributed plan creation a little slower, but still on the order of milliseconds. was (Author: JIRAUSER288956): I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves query compilation for the repro from {code} # 1st run Query Compilation: 1m15s - Metadata load started: 75.088ms (75.088ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms) - Analysis finished: 7s504ms (4s367ms) - Authorization finished (noop): 7s505ms (946.982us) - Value transfer graph computed: 7s553ms (47.618ms) - Single node plan created: 1m14s (1m7s) - Runtime filters computed: 1m15s (874.659ms) - Distributed plan created: 1m15s (1.168ms) - Planning finished: 1m15s (284.717ms) # 2nd run Query Compilation: 1m6s - Metadata of all 1 tables cached: 18.799ms (18.799ms) - Analysis finished: 3s299ms (3s280ms) - Authorization finished (noop): 3s299ms (118.618us) - Value transfer graph computed: 3s319ms (19.983ms) - Single node plan created: 1m5s (1m2s) - Runtime filters computed: 1m6s (808.587ms) - Distributed plan created: 1m6s (188.167us) - Planning finished: 1m6s (189.985ms) {code} to {code} # 1st run Query Compilation: 8s649ms - Metadata load started: 62.291ms (62.291ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms) - Analysis finished: 7s021ms (4s002ms) - Authorization finished (noop): 7s021ms (569.098us) - Value transfer graph computed: 7s070ms (48.329ms) - Single node plan created: 8s194ms (1s124ms) - Runtime filters computed: 8s261ms (67.366ms) - Distributed plan created: 8s365ms (103.186ms) - Planning finished: 8s649ms (284.506ms) # 2nd run Query Compilation: 4s621ms - Metadata of all 1 tables cached: 17.932ms (17.932ms) - Analysis finished: 3s391ms (3s373ms) - Authorization finished (noop): 3s391ms (133.671us) - Value transfer graph computed: 3s412ms (20.547ms) - Single node plan created: 4s347ms (935.582ms) - Runtime filters computed: 4s399ms (51.706ms) - Distributed plan created
[jira] [Comment Edited] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912 ] Michael Smith edited comment on IMPALA-12800 at 6/6/24 7:09 PM: I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves query compilation for the repro from {code} # 1st run Query Compilation: 1m15s - Metadata load started: 75.088ms (75.088ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms) - Analysis finished: 7s504ms (4s367ms) - Authorization finished (noop): 7s505ms (946.982us) - Value transfer graph computed: 7s553ms (47.618ms) - Single node plan created: 1m14s (1m7s) - Runtime filters computed: 1m15s (874.659ms) - Distributed plan created: 1m15s (1.168ms) - Planning finished: 1m15s (284.717ms) # 2nd run Query Compilation: 1m6s - Metadata of all 1 tables cached: 18.799ms (18.799ms) - Analysis finished: 3s299ms (3s280ms) - Authorization finished (noop): 3s299ms (118.618us) - Value transfer graph computed: 3s319ms (19.983ms) - Single node plan created: 1m5s (1m2s) - Runtime filters computed: 1m6s (808.587ms) - Distributed plan created: 1m6s (188.167us) - Planning finished: 1m6s (189.985ms) {code} to {code} # 1st run Query Compilation: 8s649ms - Metadata load started: 62.291ms (62.291ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms) - Analysis finished: 7s021ms (4s002ms) - Authorization finished (noop): 7s021ms (569.098us) - Value transfer graph computed: 7s070ms (48.329ms) - Single node plan created: 8s194ms (1s124ms) - Runtime filters computed: 8s261ms (67.366ms) - Distributed plan created: 8s365ms (103.186ms) - Planning finished: 8s649ms (284.506ms) # 2nd run Query Compilation: 4s621ms - Metadata of all 1 tables cached: 17.932ms (17.932ms) - Analysis finished: 3s391ms (3s373ms) - Authorization finished (noop): 3s391ms (133.671us) - Value transfer graph computed: 3s412ms (20.547ms) - Single node plan created: 4s347ms (935.582ms) - Runtime filters computed: 4s399ms (51.706ms) - Distributed plan created: 4s434ms (35.380ms) - Planning finished: 4s621ms (187.070ms) {code} Single node plan creation improves from over 1 minute to ~1 second. Runtime filter computation also seems to have improved by an order of magnitude. There may be some increase from hashing Exprs that makes distributed plan creation a little slower, but still on the order of milliseconds. was (Author: JIRAUSER288956): I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves query compilation for the repro from {code} # 1st run Query Compilation: 1m15s - Metadata load started: 75.088ms (75.088ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms) - Analysis finished: 7s504ms (4s367ms) - Authorization finished (noop): 7s505ms (946.982us) - Value transfer graph computed: 7s553ms (47.618ms) - Single node plan created: 1m14s (1m7s) - Runtime filters computed: 1m15s (874.659ms) - Distributed plan created: 1m15s (1.168ms) - Planning finished: 1m15s (284.717ms) # 2nd run Query Compilation: 1m6s - Metadata of all 1 tables cached: 18.799ms (18.799ms) - Analysis finished: 3s299ms (3s280ms) - Authorization finished (noop): 3s299ms (118.618us) - Value transfer graph computed: 3s319ms (19.983ms) - Single node plan created: 1m5s (1m2s) - Runtime filters computed: 1m6s (808.587ms) - Distributed plan created: 1m6s (188.167us) - Planning finished: 1m6s (189.985ms) {code} to {code} # 1st run Query Compilation: 8s649ms - Metadata load started: 62.291ms (62.291ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms) - Analysis finished: 7s021ms (4s002ms) - Authorization finished (noop): 7s021ms (569.098us) - Value transfer graph computed: 7s070ms (48.329ms) - Single node plan created: 8s194ms (1s124ms) - Runtime filters computed: 8s261ms (67.366ms) - Distributed plan created: 8s365ms (103.186ms) - Planning finished: 8s649ms (284.506ms) # 2nd run Query Compilation: 4s621ms - Metadata of all 1 tables cached: 17.932ms (17.932ms) - Analysis finished: 3s391ms (3s373ms) - Authorization finished (noop): 3s391ms (133.671us) - Value transfer graph computed: 3s412ms (20.547ms) - Single node plan created: 4s347ms (935.582ms
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852912#comment-17852912 ] Michael Smith commented on IMPALA-12800: I've posted two patches at https://gerrit.cloudera.org/c/21484/2 that improves query compilation for the repro from {code} # 1st run Query Compilation: 1m15s - Metadata load started: 75.088ms (75.088ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s137ms (3s062ms) - Analysis finished: 7s504ms (4s367ms) - Authorization finished (noop): 7s505ms (946.982us) - Value transfer graph computed: 7s553ms (47.618ms) - Single node plan created: 1m14s (1m7s) - Runtime filters computed: 1m15s (874.659ms) - Distributed plan created: 1m15s (1.168ms) - Planning finished: 1m15s (284.717ms) # 2nd run Query Compilation: 1m6s - Metadata of all 1 tables cached: 18.799ms (18.799ms) - Analysis finished: 3s299ms (3s280ms) - Authorization finished (noop): 3s299ms (118.618us) - Value transfer graph computed: 3s319ms (19.983ms) - Single node plan created: 1m5s (1m2s) - Runtime filters computed: 1m6s (808.587ms) - Distributed plan created: 1m6s (188.167us) - Planning finished: 1m6s (189.985ms) {code} to {code} # 1st run Query Compilation: 8s649ms - Metadata load started: 62.291ms (62.291ms) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=46ms: 3s019ms (2s957ms) - Analysis finished: 7s021ms (4s002ms) - Authorization finished (noop): 7s021ms (569.098us) - Value transfer graph computed: 7s070ms (48.329ms) - Single node plan created: 8s194ms (1s124ms) - Runtime filters computed: 8s261ms (67.366ms) - Distributed plan created: 8s365ms (103.186ms) - Planning finished: 8s649ms (284.506ms) # 2nd run Query Compilation: 4s621ms - Metadata of all 1 tables cached: 17.932ms (17.932ms) - Analysis finished: 3s391ms (3s373ms) - Authorization finished (noop): 3s391ms (133.671us) - Value transfer graph computed: 3s412ms (20.547ms) - Single node plan created: 4s347ms (935.582ms) - Runtime filters computed: 4s399ms (51.706ms) - Distributed plan created: 4s434ms (35.380ms) - Planning finished: 4s621ms (187.070ms) {code} > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) >
[jira] [Commented] (IMPALA-12981) Support a column list in compute stats that is retrieved via a subquery
[ https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852872#comment-17852872 ] Michael Smith commented on IMPALA-12981: Common Table Expression optimizations would help if we produce multiple subqueries that rely on the list of column names. > Support a column list in compute stats that is retrieved via a subquery > - > > Key: IMPALA-12981 > URL: https://issues.apache.org/jira/browse/IMPALA-12981 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Manish Maheshwari >Priority: Major > > Support a column list in compute stats that is retrived via a subquery - > Specifically we want to use Impala query history tables where we collect the > columns in a table that are using for joins, aggegrates, filters etc to be > passed into compute stats command. > Ideally the way that we would want it to work is that generate a table from > the query history table that has the most frequent table and most frequent > columns accessed and then feed them into the compute stats command. > Suggested Syntax - > {code:java} > Table Level - > compute stats db.tbl ( > select distinct join_columns from > from sys.impala_query_log > where contains(tables_queried, "db.tbl") > and query_dttm >current_timestamp()-7 > and join_columns rlike 'db.tbl' > ) > Across Tables - > compute stats on (select tables, columns from sys.impala_query_log where > query_dttm > current_timestamp()-7 group tables, columns by order by tables, > columns, count(1) desc having count(1) > 1000 ) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12981) Support a column list in compute stats that is retrieved via a subquery
[ https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852871#comment-17852871 ] Riza Suminto commented on IMPALA-12981: --- Compute stats over just subset of columns currently relies on getting the column name from the SQL syntax of COMPUTE STATS [https://github.com/apache/impala/blob/753ee9b8a80d8e4c0db966a3132446a5aceb05cd/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L189-L191] Supporting this feature will require running the subquery first to retrieve list of column names before running the rest of COMPUTE STATS child queries. > Support a column list in compute stats that is retrieved via a subquery > - > > Key: IMPALA-12981 > URL: https://issues.apache.org/jira/browse/IMPALA-12981 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Manish Maheshwari >Priority: Major > > Support a column list in compute stats that is retrived via a subquery - > Specifically we want to use Impala query history tables where we collect the > columns in a table that are using for joins, aggegrates, filters etc to be > passed into compute stats command. > Ideally the way that we would want it to work is that generate a table from > the query history table that has the most frequent table and most frequent > columns accessed and then feed them into the compute stats command. > Suggested Syntax - > {code:java} > Table Level - > compute stats db.tbl ( > select distinct join_columns from > from sys.impala_query_log > where contains(tables_queried, "db.tbl") > and query_dttm >current_timestamp()-7 > and join_columns rlike 'db.tbl' > ) > Across Tables - > compute stats on (select tables, columns from sys.impala_query_log where > query_dttm > current_timestamp()-7 group tables, columns by order by tables, > columns, count(1) desc having count(1) > 1000 ) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
[ https://issues.apache.org/jira/browse/IMPALA-13086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852862#comment-17852862 ] Riza Suminto commented on IMPALA-13086: --- Identifying unique column in Iceberg is possible through [https://iceberg.apache.org/spec/#identifier-field-ids] (see also IMPALA-12729). > Cardinality estimate of AggregationNode should consider predicates on > group-by columns > -- > > Key: IMPALA-13086 > URL: https://issues.apache.org/jira/browse/IMPALA-13086 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Priority: Critical > Attachments: plan.txt > > > Consider the following tables: > {code:sql} > CREATE EXTERNAL TABLE t1( > t1_id bigint, > t5_id bigint, > t5_name string, > register_date string > ) stored as textfile; > CREATE EXTERNAL TABLE t2( > t1_id bigint, > t3_id bigint, > pay_time timestamp, > refund_time timestamp, > state_code int > ) stored as textfile; > CREATE EXTERNAL TABLE t3( > t3_id bigint, > t3_name string, > class_id int > ) stored as textfile; > CREATE EXTERNAL TABLE t5( > id bigint, > t5_id bigint, > t5_name string, > branch_id bigint, > branch_name string > ) stored as textfile; > alter table t1 set tblproperties('numRows'='6031170829'); > alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); > alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); > alter table t1 set column stats t5_name > ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); > alter table t1 set column stats register_date > ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); > alter table t2 set tblproperties('numRows'='864341085'); > alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); > alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); > alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); > alter table t2 set column stats refund_time > ('numDVs'='251658','numNulls'='791645118'); > alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); > alter table t3 set tblproperties('numRows'='4452'); > alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); > alter table t3 set column stats t3_name > ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); > alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); > alter table t5 set tblproperties('numRows'='2177245'); > alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); > alter table t5 set column stats t5_name > ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); > alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); > alter table t5 set column stats branch_name > ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); > {code} > Put a data file to each table to make the stats valid > {code:bash} > echo '2024' > data.txt > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 > {code} > REFRESH these tables after adding the data files. > The cardinality of AggregationNodes are overestimated in the following query: > {code:sql} > explain select > register_date, > t4.t5_id, > t5.t5_name, > t5.branch_name, > count(distinct t1_id), > count(distinct case when diff_day=0 then t1_id else null end ), > count(distinct case when diff_day<=3 then t1_id else null end ), > count(distinct case when diff_day<=7 then t1_id else null end ), > count(distinct case when diff_day<=14 then t1_id else null end ), > count(distinct case when diff_day<=30 then t1_id else null end ), > count(distinct case when diff_day<=60 then t1_id else null end ), > count(distinct case when pay_time is not null then t1_id else null end ) > from ( > select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, > datediff(pay_time,register_date) diff_day > from ( > select t1_id,pay_time,t3_id from t2 > where state_code = 0 and pay_time>=trunc(NOW(),'Y') > and cast(pay_time as date) <> cast(refund_time as date) > )t2 > join t3
[jira] [Updated] (IMPALA-12823) Repeated query not found messages in impalad.INFO logs
[ https://issues.apache.org/jira/browse/IMPALA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-12823: -- Component/s: Backend Language: javascript > Repeated query not found messages in impalad.INFO logs > -- > > Key: IMPALA-12823 > URL: https://issues.apache.org/jira/browse/IMPALA-12823 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > Attachments: repeated_impalad_info_logs.png > > > If an unknown or closed query page is open. Repeated messages are produced in > the logs. > This is due to those pages repetitively querying the impala server, when the > query does not exist. > !repeated_impalad_info_logs.png|width=608,height=84! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12823) Repeated query not found messages in impalad.INFO logs
[ https://issues.apache.org/jira/browse/IMPALA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12823. - > Repeated query not found messages in impalad.INFO logs > -- > > Key: IMPALA-12823 > URL: https://issues.apache.org/jira/browse/IMPALA-12823 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > Attachments: repeated_impalad_info_logs.png > > > If an unknown or closed query page is open. Repeated messages are produced in > the logs. > This is due to those pages repetitively querying the impala server, when the > query does not exist. > !repeated_impalad_info_logs.png|width=608,height=84! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12803) Fix missing exchange lines in query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-12803: -- Component/s: Backend Language: javascript > Fix missing exchange lines in query timeline > > > Key: IMPALA-12803 > URL: https://issues.apache.org/jira/browse/IMPALA-12803 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Attachments: missing_exchange_lines.png, proper_exchange_lines.png > > > In the fragment diagram of the query timeline, the exchange lines between > nodes is missing when plan order is used. !missing_exchange_lines.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12803) Fix missing exchange lines in query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12803. - > Fix missing exchange lines in query timeline > > > Key: IMPALA-12803 > URL: https://issues.apache.org/jira/browse/IMPALA-12803 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Attachments: missing_exchange_lines.png, proper_exchange_lines.png > > > In the fragment diagram of the query timeline, the exchange lines between > nodes is missing when plan order is used. !missing_exchange_lines.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12415) Implement tests for graphical query timeline in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar resolved IMPALA-12415. --- Resolution: Fixed > Implement tests for graphical query timeline in webUI > - > > Key: IMPALA-12415 > URL: https://issues.apache.org/jira/browse/IMPALA-12415 > Project: IMPALA > Issue Type: Task > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Manually testing the webUI's query timeline each time is unreliable and may > produce edge cases, that are not always covered due to human error. > To ensure proper functioning, as more features are incorporated, proper test > cases along with a testing framework is required for the query timeline. > As a first step, for implementing unit and integration tests, the query > timeline's script should be divided into multiple properly functioning > modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12415) Implement tests for graphical query timeline in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12415. - > Implement tests for graphical query timeline in webUI > - > > Key: IMPALA-12415 > URL: https://issues.apache.org/jira/browse/IMPALA-12415 > Project: IMPALA > Issue Type: Task > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Manually testing the webUI's query timeline each time is unreliable and may > produce edge cases, that are not always covered due to human error. > To ensure proper functioning, as more features are incorporated, proper test > cases along with a testing framework is required for the query timeline. > As a first step, for implementing unit and integration tests, the query > timeline's script should be divided into multiple properly functioning > modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12364. - > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar resolved IMPALA-12364. --- Resolution: Fixed > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once
[ https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-13105: -- Component/s: Backend > Multiple imported query profiles fail to import/clear at once > - > > Key: IMPALA-13105 > URL: https://issues.apache.org/jira/browse/IMPALA-13105 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > When multiple query profiles are chosen at once, the last query profile in > the insertion queue fails as the page reloads without providing a delay for > inserting it. > > The same behavior is seen when clearing all the query profiles. > > This is mostly seen in Chromium based browsers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once
[ https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-13105: -- Language: javascript > Multiple imported query profiles fail to import/clear at once > - > > Key: IMPALA-13105 > URL: https://issues.apache.org/jira/browse/IMPALA-13105 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > When multiple query profiles are chosen at once, the last query profile in > the insertion queue fails as the page reloads without providing a delay for > inserting it. > > The same behavior is seen when clearing all the query profiles. > > This is mostly seen in Chromium based browsers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once
[ https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-13105. - > Multiple imported query profiles fail to import/clear at once > - > > Key: IMPALA-13105 > URL: https://issues.apache.org/jira/browse/IMPALA-13105 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > When multiple query profiles are chosen at once, the last query profile in > the insertion queue fails as the page reloads without providing a delay for > inserting it. > > The same behavior is seen when clearing all the query profiles. > > This is mostly seen in Chromium based browsers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13106) Support larger imported query profile sizes through compression
[ https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-13106: -- Language: HTML CSS javascript > Support larger imported query profile sizes through compression > --- > > Key: IMPALA-13106 > URL: https://issues.apache.org/jira/browse/IMPALA-13106 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Imported query profiles are currently being stored in IndexedDB. > Although IndexedDB does not have storage limitations like other browser > storage APIs, there is a limit on the data that can be stored in one > attribute/field. > This imposes a limitation on the size of query profiles. After some testing, > I have found this limit to be around 220 MBs. > So, it would be helpful to use compression on JSON query profiles, allowing > for much larger query profiles. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13106) Support larger imported query profile sizes through compression
[ https://issues.apache.org/jira/browse/IMPALA-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-13106: -- Component/s: Backend Infrastructure > Support larger imported query profile sizes through compression > --- > > Key: IMPALA-13106 > URL: https://issues.apache.org/jira/browse/IMPALA-13106 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Imported query profiles are currently being stored in IndexedDB. > Although IndexedDB does not have storage limitations like other browser > storage APIs, there is a limit on the data that can be stored in one > attribute/field. > This imposes a limitation on the size of query profiles. After some testing, > I have found this limit to be around 220 MBs. > So, it would be helpful to use compression on JSON query profiles, allowing > for much larger query profiles. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-12364: -- Component/s: Infrastructure > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar reopened IMPALA-12364: --- > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-12415) Implement tests for graphical query timeline in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar reopened IMPALA-12415: --- > Implement tests for graphical query timeline in webUI > - > > Key: IMPALA-12415 > URL: https://issues.apache.org/jira/browse/IMPALA-12415 > Project: IMPALA > Issue Type: Task > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Manually testing the webUI's query timeline each time is unreliable and may > produce edge cases, that are not always covered due to human error. > To ensure proper functioning, as more features are incorporated, proper test > cases along with a testing framework is required for the query timeline. > As a first step, for implementing unit and integration tests, the query > timeline's script should be divided into multiple properly functioning > modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12415) Implement tests for graphical query timeline in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar updated IMPALA-12415: -- Component/s: Infrastructure Language: javascript shell bash (was: javascript) > Implement tests for graphical query timeline in webUI > - > > Key: IMPALA-12415 > URL: https://issues.apache.org/jira/browse/IMPALA-12415 > Project: IMPALA > Issue Type: Task > Components: Backend, Infrastructure >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Manually testing the webUI's query timeline each time is unreliable and may > produce edge cases, that are not always covered due to human error. > To ensure proper functioning, as more features are incorporated, proper test > cases along with a testing framework is required for the query timeline. > As a first step, for implementing unit and integration tests, the query > timeline's script should be divided into multiple properly functioning > modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12688) Support JSON profile imports in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12688. - > Support JSON profile imports in webUI > - > > Key: IMPALA-12688 > URL: https://issues.apache.org/jira/browse/IMPALA-12688 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: clear_all_button.png, descending_order_start_time.png, > imported_profiles_section.png, imported_queries_button.png, > imported_queries_list.png, imported_queries_page.png, > imported_query_statement.png, imported_query_text_plan.png, > imported_query_timeline.png, multiple_query_profile_import.png > > > It would be helpful for users to visualize the query timeline by selecting a > local JSON query profile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12473) Fix query profile's missing event timestamps exception in query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12473. - > Fix query profile's missing event timestamps exception in query timeline > > > Key: IMPALA-12473 > URL: https://issues.apache.org/jira/browse/IMPALA-12473 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > Attachments: q4_json_profile_missing_event_timestamp.png, > q64_json_profile_missing_event_timestamp.png > > > The query profile contains different fragment's and there plan node's event > timestamps. > > Sometimes the expected timestamps from an event are missing, this currently > throws an exception, hence it stops the rendering of query timeline. > > The issue needs to be fixed with some validation of these timestamp event > labels, without raising an exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12504) Split graphical query timeline script into modules for testing
[ https://issues.apache.org/jira/browse/IMPALA-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12504. - > Split graphical query timeline script into modules for testing > -- > > Key: IMPALA-12504 > URL: https://issues.apache.org/jira/browse/IMPALA-12504 > Project: IMPALA > Issue Type: Task > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > > The graphical query timeline needs to be split into multiple es6 modules for > better maintainability and for writing automated tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12365) Show fragment's memory and thread usage on the query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12365. - > Show fragment's memory and thread usage on the query timeline > - > > Key: IMPALA-12365 > URL: https://issues.apache.org/jira/browse/IMPALA-12365 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Attachments: aligned_gridlines.png, > aligned_gridlines_and_hovering_scroll.mkv, both_charts_resize.mkv, > both_charts_resize.png, clickable_plan_nodes.mkv, clickable_plan_nodes.png, > draggable_resize_handle.png, fragment_metrics_chart_resize.mkv, > fragment_metrics_close_button.png, fragment_metrics_resize_bar.png, > hor_zoom_buttons.png, horizontal_zoom_buttons.mkv, > multiple_fragment_metrics.mkv, multiple_fragment_metrics_cropped.png, > resize_drag_handle.mkv > > > The query timeline's fragment diagram can be used to display different memory > and thread usage metrics, to support query planning and debugging purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12417) Query timeline not working when enable asynchronous codegen
[ https://issues.apache.org/jira/browse/IMPALA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12417. - > Query timeline not working when enable asynchronous codegen > --- > > Key: IMPALA-12417 > URL: https://issues.apache.org/jira/browse/IMPALA-12417 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zihao Ye >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: jirabug.png, jirabug2.jpeg > > > When set async_codegen=true, the timeline page of the query will be > unavailable, seemingly because the preset colors did not take into account > events of asynchronous Codegen. > !jirabug2.jpeg! > !jirabug.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12364. - > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12415) Implement tests for graphical query timeline in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Hebbar closed IMPALA-12415. - > Implement tests for graphical query timeline in webUI > - > > Key: IMPALA-12415 > URL: https://issues.apache.org/jira/browse/IMPALA-12415 > Project: IMPALA > Issue Type: Task > Components: Backend >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > Manually testing the webUI's query timeline each time is unreliable and may > produce edge cases, that are not always covered due to human error. > To ensure proper functioning, as more features are incorporated, proper test > cases along with a testing framework is required for the query timeline. > As a first step, for implementing unit and integration tests, the query > timeline's script should be divided into multiple properly functioning > modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org