[jira] [Created] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala

2022-11-23 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-11743:


 Summary: Investigate how to support the OWNER privilege for UDFs 
in Impala
 Key: IMPALA-11743
 URL: https://issues.apache.org/jira/browse/IMPALA-11743
 Project: IMPALA
  Issue Type: New Feature
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Currently in Impala a user allowed to create a UDF in a database still has to 
be explicitly granted the necessary privileges to execute the UDF later in a 
SELECT query. It would be more convenient if the ownership information of a UDF 
could also be retrieved during the query analysis of such SELECT queries so 
that the owner/creator of a UDF will be allowed to execute the UDF without 
being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11738) Data loading failed at load-functional-query-exhaustive-hive-generated-orc-def-block.sql

2022-11-23 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638030#comment-17638030
 ] 

Joe McDonnell commented on IMPALA-11738:


Actually, the problem is that the toolchain libstdc++ is not in LD_LIBRARY_PATH 
for hive. Better fix here: [https://gerrit.cloudera.org/#/c/19276/]

 

> Data loading failed at 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql
> 
>
> Key: IMPALA-11738
> URL: https://issues.apache.org/jira/browse/IMPALA-11738
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
>Reporter: Yida Wu
>Assignee: Joe McDonnell
>Priority: Major
>
> Ran "./bin/bootstrap_development.sh" to build the system from scratch.
> It seems to crash in hive-server2 when it executes a query
> {code:java}
> select count(*) as mv_count from functional_orc_def.mv1_alltypes_jointbl{code}
> during loading 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.
> Found errors in 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log:
> {code:java}
> Unknown HS2 problem when communicating with Thrift server.
> Error: org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Broken pipe (Write failed) (state=08S01,code=0)
> java.sql.SQLException: org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Broken pipe (Write failed)
>         at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:225)
>         at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:266)
>         at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:289)
>         at 
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:1067)
>         at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>         at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>         at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>         at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1362)
>         at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1336)
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1089)
>         at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:547)
>         at org.apache.hive.beeline.BeeLine.main(BeeLine.java:529)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:232){code}
> Also found a crash jstack:
> {code:java}
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs()V+0
> j org.apache.hadoop.io.compress.zlib.ZlibCompressor.()V+18
> v ~StubRoutines::call_stub
> j org.apache.hadoop.io.compress.zlib.ZlibFactory.loadNativeZLib()V+6
> j org.apache.hadoop.io.compress.zlib.ZlibFactory.()V+12
> v ~StubRoutines::call_stub
> j 
> org.apache.hadoop.io.compress.DefaultCodec.getDecompressorType()Ljava/lang/Class;+4
> j 
> org.apache.hadoop.io.compress.CodecPool.getDecompressor(Lorg/apache/hadoop/io/compress/CompressionCodec;)Lorg/apache/hadoop/io/compress/Decompressor;+4
> j org.apache.hadoop.io.SequenceFile$Reader.init(Z)V+486
> j 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/FSDataInputStream;JJLorg/apache/hadoop/conf/Configuration;Z)V+84
> j 
> org.apache.hadoop.io.SequenceFile$Reader.(Lorg/apache/hadoop/conf/Configuration;[Lorg/apache/hadoop/io/SequenceFile$Reader$Option;)V+407
> j 
> org.apache.hadoop.io.SequenceFile$Reader.(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/conf/Configuration;)V+17
> j 
> org.apache.hadoop.mapred.SequenceFileRecordReader.(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/mapred/FileSplit;)V+30
> j 
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(Lorg/apache/hadoop/mapred/InputSplit;Lorg/apache/hadoop/mapred/JobConf;Lorg/apache/hadoop/mapred/Reporter;)Lorg/apache/hadoop/mapred/RecordReader;+19
> j 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(Lorg/apache/hadoop/mapred/JobConf;)Lorg/apache/hadoop/mapred/RecordReader;+12
> j 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader()Lorg/apache/hadoop/mapred/RecordReader;+266
> j 
> 

[jira] [Created] (IMPALA-11742) libfesupport.so should hide symbols that are not directly needed

2022-11-23 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11742:
--

 Summary: libfesupport.so should hide symbols that are not directly 
needed
 Key: IMPALA-11742
 URL: https://issues.apache.org/jira/browse/IMPALA-11742
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell


libfesupport.so includes a large number of libraries (basically most of 
Impala's C++ code and C++ dependencies), but there are only a limited number of 
functions that users of libfesupport.so need to access externally. We should 
hide the symbols that are not directly needed so that they cannot conflict with 
the user's symbols.

One way to do that is to use ld's version-script to specify a symbol map. It 
can list the global symbols and then exclude all others.

The list of symbols exposed from libfesupport.so is somewhat complicated. it 
includes the JNI functions in fe-support.cc, as well as a large number of 
functions in the exprs directory. Essentially, the functions listed in 
common/function-registry/impala_functions.py need to be exposed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11738) Data loading failed at load-functional-query-exhaustive-hive-generated-orc-def-block.sql

2022-11-23 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638007#comment-17638007
 ] 

Joe McDonnell commented on IMPALA-11738:


Just to follow up: The fix here helped on Centos 7 with ASAN/TSAN/UBSAN (Clang 
builds).

On Ubuntu 18, Hive still crashes during dataload and it is still related to 
libfesupport.so. This reproduces consistently by connecting with beeline and 
running the statement with compression enabled.
{noformat}
beeline -n $USER -u "jdbc:hive2://localhost:11050/default;auth=none"

SET hive.exec.compress.output=true;
select count(*) as mv_count from 
functional_orc_def.mv1_alltypes_jointbl;{noformat}
Since it does not crash when compression is not enabled, this is a potential 
workaround:

[http://gerrit.cloudera.org:8080/19275]

Feel free to go ahead with this if it seems worthwhile.

> Data loading failed at 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql
> 
>
> Key: IMPALA-11738
> URL: https://issues.apache.org/jira/browse/IMPALA-11738
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
>Reporter: Yida Wu
>Assignee: Joe McDonnell
>Priority: Major
>
> Ran "./bin/bootstrap_development.sh" to build the system from scratch.
> It seems to crash in hive-server2 when it executes a query
> {code:java}
> select count(*) as mv_count from functional_orc_def.mv1_alltypes_jointbl{code}
> during loading 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.
> Found errors in 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log:
> {code:java}
> Unknown HS2 problem when communicating with Thrift server.
> Error: org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Broken pipe (Write failed) (state=08S01,code=0)
> java.sql.SQLException: org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Broken pipe (Write failed)
>         at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:225)
>         at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:266)
>         at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:289)
>         at 
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:1067)
>         at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>         at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>         at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>         at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1362)
>         at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1336)
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1089)
>         at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:547)
>         at org.apache.hive.beeline.BeeLine.main(BeeLine.java:529)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:232){code}
> Also found a crash jstack:
> {code:java}
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j org.apache.hadoop.io.compress.zlib.ZlibCompressor.initIDs()V+0
> j org.apache.hadoop.io.compress.zlib.ZlibCompressor.()V+18
> v ~StubRoutines::call_stub
> j org.apache.hadoop.io.compress.zlib.ZlibFactory.loadNativeZLib()V+6
> j org.apache.hadoop.io.compress.zlib.ZlibFactory.()V+12
> v ~StubRoutines::call_stub
> j 
> org.apache.hadoop.io.compress.DefaultCodec.getDecompressorType()Ljava/lang/Class;+4
> j 
> org.apache.hadoop.io.compress.CodecPool.getDecompressor(Lorg/apache/hadoop/io/compress/CompressionCodec;)Lorg/apache/hadoop/io/compress/Decompressor;+4
> j org.apache.hadoop.io.SequenceFile$Reader.init(Z)V+486
> j 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/FSDataInputStream;JJLorg/apache/hadoop/conf/Configuration;Z)V+84
> j 
> org.apache.hadoop.io.SequenceFile$Reader.(Lorg/apache/hadoop/conf/Configuration;[Lorg/apache/hadoop/io/SequenceFile$Reader$Option;)V+407
> j 
> org.apache.hadoop.io.SequenceFile$Reader.(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/conf/Configuration;)V+17
> j 
> org.apache.hadoop.mapred.SequenceFileRecordReader.(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/mapred/FileSplit;)V+30
> j 
> 

[jira] [Resolved] (IMPALA-11741) Impala docker builds should verify that 'hostname' is installed

2022-11-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-11741.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Impala docker builds should verify that 'hostname' is installed
> ---
>
> Key: IMPALA-11741
> URL: https://issues.apache.org/jira/browse/IMPALA-11741
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
> Fix For: Impala 4.2.0
>
>
> Apparently, the Redhat UBI8 base images don't come with the 'hostname' 
> utility installed. The 'hostname' utility is useful for initializing various 
> Docker-based deployments. We should make sure that it is installed in all 
> Impala Docker images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11741) Impala docker builds should verify that 'hostname' is installed

2022-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637999#comment-17637999
 ] 

ASF subversion and git services commented on IMPALA-11741:
--

Commit 52956bae141acf2ecdd7b28ff7edb4d2f2fe3f10 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=52956bae1 ]

IMPALA-11741: Verify that 'hostname' is installed in Docker images

Some deployments rely on having the 'hostname' utility
installed in Impala's Docker image (e.g. for constructing
daemon startup arguments). Most distributions include it
by default, but Redhat UBI8 does not.

This adds 'hostname' to the list of installed packages
for both Ubuntu and the Redhat family. This also verifies
that 'hostname' runs properly.

Testing:
 - Verified that this adds hostname for UBI8 images

Change-Id: I5a760680294a3ad7e74e843d3f4c06cd38819e88
Reviewed-on: http://gerrit.cloudera.org:8080/19273
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Impala docker builds should verify that 'hostname' is installed
> ---
>
> Key: IMPALA-11741
> URL: https://issues.apache.org/jira/browse/IMPALA-11741
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> Apparently, the Redhat UBI8 base images don't come with the 'hostname' 
> utility installed. The 'hostname' utility is useful for initializing various 
> Docker-based deployments. We should make sure that it is installed in all 
> Impala Docker images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11375) impala-shell: log important message for rpc requests/response

2022-11-23 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr reassigned IMPALA-11375:
---

Assignee: Jason Fehr

> impala-shell: log important message for rpc requests/response
> -
>
> Key: IMPALA-11375
> URL: https://issues.apache.org/jira/browse/IMPALA-11375
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Abhishek Rawat
>Assignee: Jason Fehr
>Priority: Major
>
> impala-shell client logs can be improved to add logs when an rpc request is 
> sent/received.
> Also, would be good to add 'X-Request-ID' in the http header for hs2-http 
> mode. This is useful for tracing end to end logs when using a reverse proxy 
> such as nginx. Would be good to include the X-Request-ID in the server logs 
> also, when it logs rpc requests and responses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11741) Impala docker builds should verify that 'hostname' is installed

2022-11-23 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11741:
--

 Summary: Impala docker builds should verify that 'hostname' is 
installed
 Key: IMPALA-11741
 URL: https://issues.apache.org/jira/browse/IMPALA-11741
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Apparently, the Redhat UBI8 base images don't come with the 'hostname' utility 
installed. The 'hostname' utility is useful for initializing various 
Docker-based deployments. We should make sure that it is installed in all 
Impala Docker images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11722) Wrong error message when unsupported complex type comes from * expression

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11722:
---
 Fix Version/s: Impala 4.2.0
Target Version: Impala 4.2.0

> Wrong error message when unsupported complex type comes from * expression
> -
>
> Key: IMPALA-11722
> URL: https://issues.apache.org/jira/browse/IMPALA-11722
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Peter Rozsa
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> The following query fails with a NullPointerException:
> {code:java}
> select * from functional_orc_def.complextypestbl;
> ERROR: NullPointerException: null
> {code}
> The table contains a struct, {{{}nested_struct{}}}, which is not supported 
> yet because it contains collections. If the columns are listed explicitly, 
> the error message is the correct one:
> {code:java}
> select id, int_array, int_array_array, int_map, int_map_array, nested_struct 
> from functional_orc_def.complextypestbl;
> ERROR: AnalysisException: Struct containing a collection type is not allowed 
> in the select list.{code}
> The same error message should be returned in the select * case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11740) Incorrect results for partitioned Iceberg V2 tables when runtime filters are applied

2022-11-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-11740:
--

Assignee: Zoltán Borók-Nagy

> Incorrect results for partitioned Iceberg V2 tables when runtime filters are 
> applied
> 
>
> Key: IMPALA-11740
> URL: https://issues.apache.org/jira/browse/IMPALA-11740
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> If an Iceberg V2 table is partitioned, and contains delete files, then in a 
> query that involves runtime filters on the partition columns return empty 
> result set.
> {noformat}
> select count(*) from store_sales, date_dim where d_date_sk = ss_sold_date_sk 
> and d_moy=2 and d_year=1998;
> {noformat}
> In the above query store_sales is partitioned  by ss_sold_date_sk which will 
> be filtered by runtime filters created by the JOIN. If store_sales has delete 
> files then the above query returns empty result set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11740) Incorrect results for partitioned Iceberg V2 tables when runtime filters are applied

2022-11-23 Thread Jira
Zoltán Borók-Nagy created IMPALA-11740:
--

 Summary: Incorrect results for partitioned Iceberg V2 tables when 
runtime filters are applied
 Key: IMPALA-11740
 URL: https://issues.apache.org/jira/browse/IMPALA-11740
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


If an Iceberg V2 table is partitioned, and contains delete files, then in a 
query that involves runtime filters on the partition columns return empty 
result set.

{noformat}
select count(*) from store_sales, date_dim where d_date_sk = ss_sold_date_sk 
and d_moy=2 and d_year=1998;
{noformat}

In the above query store_sales is partitioned  by ss_sold_date_sk which will be 
filtered by runtime filters created by the JOIN. If store_sales has delete 
files then the above query returns empty result set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10570) FE tests get stuck and eventually time out during UBSAN build

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10570:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> FE tests get stuck and eventually time out during UBSAN build
> -
>
> Key: IMPALA-10570
> URL: https://issues.apache.org/jira/browse/IMPALA-10570
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Laszlo Gaal
>Priority: Blocker
>  Labels: broken-build
>
> During UBSAN builds on private infrastructure (using CentOS 7.4 as the OS 
> platform) FE tests get stuck, then eventually get killed by the timeout 
> mechanism in buildall.sh. Unfortunately output buffering makes the Jenkins 
> console log slightly confusing, so it is not easy to tell which exact FE test 
> is getting stuck: the timeout and build shutdown sequences seem to be 
> artificially inserted into the middle of the FE build result summary.
> Representative log section:
> {code}
> 01:44:28.496 [INFO] Tests run: 99, Failures: 0, Errors: 0, Skipped: 0, Time 
> elapsed: 0.305 s - in org.apache.impala.analysis.ParserTest
> 01:44:28.496 [INFO] Running org.apache.impala.analysis.ToSqlTest
> 01:44:28.496 [INFO] Tests run: 41, Failures: 0, Errors: 0, Skipped: 0, Time 
> elapsed: 0.88 s - in org.apache.impala.analysis.ToSqlTest
> 01:44:28.496 [INFO] Running org.apache.impala.analysis.Ana
> 01:44:28.496 
> 20:55:34.940  run-all-tests.sh TIMED OUT! 
> 20:55:34.943 
> 20:55:34.943 
> 20:55:34.943  Generating backtrace of impalad with process id: 15971 to 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/logs/timeout_stacktrace/impalad_15971_20210308-093840.txt
>  
> [. lots of debug and strack trace output elided for brevity's sake]
> [ complete log section for build shutdown elided.]
> 20:57:13.167 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/common/thrift/MetricDefs.thrift
>  created.
> 20:57:13.167 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/impala_schema.mdl 
> created.
> 20:57:13.167 + pg_dump -U hiveuser 
> HMS_data_jenkins_workspace_impala_cdpd_master_core_ubsan_re_cdp
> 20:57:13.167 + exit 1
> 20:57:13.502 Process leaked file descriptors. See 
> https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors 
> for more information
> 20:57:23.505 Build step 'Execute shell' marked build as failure
> 20:57:23.576 lyzeStmtsTest
> 20:57:27.934 [INFO] Tests run: 67, Failures: 0, Errors: 0, Skipped: 0, Time 
> elapsed: 1.999 s - in org.apache.impala.analysis.AnalyzeStmtsTest
> 20:57:27.934 [INFO] Running org.apache.impala.analysis.ExprRewriteRulesTest
> 20:57:27.934 [INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time 
> elapsed: 0.121 s - in org.apache.impala.analysis.ExprRewriteRulesTest
> [ rest of the FE test result summary follows.]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2210) Make Parquet the default file format

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-2210:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Make Parquet the default file format
> 
>
> Key: IMPALA-2210
> URL: https://issues.apache.org/jira/browse/IMPALA-2210
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: John Russell
>Priority: Blocker
>  Labels: incompatibility, newbie, usability
>
> I expect that by far the most common use case for CREATE TABLE LIKE PARQUET 
> is to make a table where the specified Parquet file will be queried.  That 
> is, either:
> CREATE TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS PARQUET;
> LOAD DATA INFILE '/blah/blah/file.parq' INTO TABLE foo;
> or
> CREATE EXTERNAL TABLE foo LIKE PARQUET '/blah/blah/file.parq' STORED AS 
> PARQUET LOCATION '/blah/blah';
> I have difficulty imagining a case where someone would do CREATE TABLE LIKE 
> PARQUET and want the result to be a text table. Even if someone planned to 
> convert Parquet -> text, they would need to have a Parquet table to begin 
> with, in which case they would do CREATE TABLE text_table LIKE parquet_table, 
> not CREATE TABLE LIKE PARQUET.
> It is easy to leave off the STORED AS PARQUET clause by mistake from a CTLP 
> statement, because PARQUET already occurs earlier in the statement, resulting 
> in a text table that throws conversion errors when queried. How about making 
> Parquet the default format in this case, and requiring the STORED AS clause 
> only to use a different file format? (Then if Impala implemented  a CREATE 
> TABLE LIKE AVRO syntax, the default in that case would be Avro.)
> Since I guess this would qualify as an incompatible change, we would need to 
> think through the appropriate release vehicle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9486) Creating a Kudu table via JDBC fails with "IllegalArgumentException"

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9486:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Creating a Kudu table via JDBC fails with "IllegalArgumentException"
> 
>
> Key: IMPALA-9486
> URL: https://issues.apache.org/jira/browse/IMPALA-9486
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Grant Henke
>Assignee: Fang-Yu Rao
>Priority: Blocker
>
> A Kudu user reported that though creating tables via impala shell or Hue, 
> when using an external tool connected via JDBC the create statement fails 
> with the following:
> {noformat}
> [ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, 
> SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:ImpalaRuntimeException: Error creating Kudu table 
> 'impala::default.foo' CAUSED BY: IllegalArgumentException: table owner must 
> not be null or empty ), Query: …
> {noformat}
>  
> When debugging the issue further it looks like the call to set the owner on 
> the Kudu table should not be called if an owner is not explicitly set:
> [https://github.com/apache/impala/blob/497a17dbdc0669abd47c2360b8ca94de8b54d413/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L252]
>  
> A possible fix could be to guard the call with _isSetOwner_:
> {code:java}
> if (msTbl.isSetOwner()) { 
>tableOpts.setOwner(msTbl.getOwner()); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10344) Error loading functional_orc_def.alltypesaggmultifiles in hive

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10344:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Error loading functional_orc_def.alltypesaggmultifiles in hive
> --
>
> Key: IMPALA-10344
> URL: https://issues.apache.org/jira/browse/IMPALA-10344
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: hive-metastore.out, hive-server2.out, 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log
>
>
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> Exception when loading 11 partitions in table alltypesaggmultifiles 
> {noformat}
> 0: jdbc:hive2://localhost:11050/default> insert into table 
> functional_orc_def.al 
> ltypesaggmultifiles partition (year, month, day) SELECT id, bool_col, 
> tinyint_co 
> l, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, 
> st 
> ring_col, timestamp_col, year, month, day FROM 
> functional.alltypesaggmultifiles  
> where id % 4 = 1;
> going to print operations logs
> printed operations logs
> going to print operations logs
> INFO  : Compiling 
> command(queryId=jenkins_20201113135913_d2ecc422-ec51-4fe2-9e92-e8d93b01aca9): 
> insert into table functional_orc_def.alltypesaggmultifiles partition (year, 
> month, day) SELECT id, bool_col, tinyint_col, smallint_col, int_col, 
> bigint_col, float_col, double_col, date_string_col, string_col, 
> timestamp_col, year, month, day FROM functional.alltypesaggmultifiles where 
> id % 4 = 1
> INFO  : No Stats for functional@alltypesaggmultifiles, Columns: float_col, 
> int_col, string_col, bool_col, date_string_col, smallint_col, timestamp_col, 
> tinyint_col, bigint_col, id, double_col
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, 
> type:int, comment:null), FieldSchema(name:bool_col, type:boolean, 
> comment:null), FieldSchema(name:tinyint_col, type:tinyint, comment:null), 
> FieldSchema(name:smallint_col, type:smallint, comment:null), 
> FieldSchema(name:int_col, type:int, comment:null), 
> FieldSchema(name:bigint_col, type:bigint, comment:null), 
> FieldSchema(name:float_col, type:float, comment:null), 
> FieldSchema(name:double_col, type:double, comment:null), 
> FieldSchema(name:date_string_col, type:string, comment:null), 
> FieldSchema(name:string_col, type:string, comment:null), 
> FieldSchema(name:timestamp_col, type:timestamp, comment:null), 
> FieldSchema(name:year, type:int, comment:null), FieldSchema(name:month, 
> type:int, comment:null), FieldSchema(name:day, type:int, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=jenkins_20201113135913_d2ecc422-ec51-4fe2-9e92-e8d93b01aca9); 
> Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=jenkins_20201113135913_d2ecc422-ec51-4fe2-9e92-e8d93b01aca9): 
> insert into table functional_orc_def.alltypesaggmultifiles partition (year, 
> month, day) SELECT id, bool_col, tinyint_col, smallint_col, int_col, 
> bigint_col, float_col, double_col, date_string_col, string_col, 
> timestamp_col, year, month, day FROM functional.alltypesaggmultifiles where 
> id % 4 = 1
> INFO  : Query ID = jenkins_20201113135913_d2ecc422-ec51-4fe2-9e92-e8d93b01aca9
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Subscribed to counters: [] for queryId: 
> jenkins_20201113135913_d2ecc422-ec51-4fe2-9e92-e8d93b01aca9
> INFO  : Session is already open
> INFO  : Dag name: insert into table functional_orc_def.all...1 (Stage-1)
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1605304119768_0012)
> printed operations logs
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  
> RUNNING  PENDING  FAILED  KILLED  
> --
> Map 1 .. container SUCCEEDED 42 420   
>  0   0   0  
> Reducer 2container   RUNNING  1  010  
>  0   0  
> --
> VERTICES: 01/02  [=>>-] 97%   ELAPSED 
> TIME: 1.46 s 
> --
> 

[jira] [Updated] (IMPALA-10575) Expired sessions not closed in Impala

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10575:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Expired sessions not closed in Impala
> -
>
> Key: IMPALA-10575
> URL: https://issues.apache.org/jira/browse/IMPALA-10575
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Jimi
>Priority: Blocker
>  Labels: impala
> Attachments: image-2021-03-10-00-17-41-487.png
>
>
> jdbc query option:
> {code:java}
> jdbc:impala://ip:port/cdp;idle_session_timeout=10;QUERY_TIMEOUT_S=10
> {code}
> but unclosed expired session, like this:
> !image-2021-03-10-00-17-41-487.png|width=1367,height=199!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10567) Failed close open session in flight on impalad web UI

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10567:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Failed close open session in flight on impalad web UI
> -
>
> Key: IMPALA-10567
> URL: https://issues.apache.org/jira/browse/IMPALA-10567
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Jimi
>Priority: Blocker
> Attachments: image-2021-03-09-11-12-01-488.png, 
> image-2021-03-09-11-12-10-922.png, image-2021-03-09-11-12-14-683.png
>
>
> when up to fe_service_threads limit, it will be hang open connection from the 
> jdbc client.
> Then i close in-flight session on impalad web UI, it still to hang it. my 
> impala version is 3.2.0-SNAPSHOT.
> After close in-flight session on impalad web UI, some metrics follows:
> !image-2021-03-09-11-12-01-488.png|width=1509,height=246!
> !image-2021-03-09-11-12-10-922.png|width=1563,height=221!!image-2021-03-09-11-12-14-683.png|width=1826,height=223!
> {{}}{{}}
> {code:java}
> //代码占位符
> impala-server.num-fragments-in-flight:63
> impala-server.num-open-hiveserver2-sessions:63
> impala-server.num-queries-registered:63
> impala.thrift-server.hiveserver2-frontend.connections-in-use:64
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6890) split-hbase.sh: Can't get master address from ZooKeeper; znode data == null

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-6890:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> split-hbase.sh: Can't get master address from ZooKeeper; znode data == null
> ---
>
> Key: IMPALA-6890
> URL: https://issues.apache.org/jira/browse/IMPALA-6890
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: Vuk Ercegovac
>Assignee: Joe McDonnell
>Priority: Critical
>
> {noformat}
> 20:57:13 FAILED (Took: 7 min 58 sec)
> 20:57:13 
> '/data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh'
>  failed. Tail of log:
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:44 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> ...
> 20:57:13 Wed Apr 18 20:57:13 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4329)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4321)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2952)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.(HBaseTestDataRegionAssigment.java:74)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.main(HBaseTestDataRegionAssigment.java:310)
> 20:57:13 Caused by: org.apache.hadoop.hbase.MasterNotRunningException: 
> java.io.IOException: Can't get master address from ZooKeeper; znode data == 
> null
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1698)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1718)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1875)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
> 20:57:13  ... 5 more
> 20:57:13 Caused by: java.io.IOException: Can't get master address from 
> ZooKeeper; znode data == null
> 20:57:13  at 
> org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:154)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1648)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1689)
> 20:57:13  ... 9 more
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh
>  at line 41: "$JAVA" ${JAVA_KERBEROS_MAGIC} \
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/bin/run-all-tests.sh
>  at line 48: # Run End-to-end Tests{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10338) TestAdmissionController.test_queue_reasons_slots flaky because of slow/hanging fragment

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10338:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> TestAdmissionController.test_queue_reasons_slots flaky because of 
> slow/hanging fragment
> ---
>
> Key: IMPALA-10338
> URL: https://issues.apache.org/jira/browse/IMPALA-10338
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky, hang
> Attachments: failure-output.txt, impalad.ERROR, impalad.INFO
>
>
> This is on an s3 debug build, commit 5a00a4c06f8ec40a8867dcbc036cf5bb47b8a3be
> {noformat}
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  (from pytest)
> Failing for the past 1 build (Since Failed#673 )
> Took 1 min 58 sec.
> add description
> Error Message
> Timeout: query 504b3a2511f3cd0e:e27bec6b did not reach one of the 
> expected states [4], last known state 3
> Stacktrace
> custom_cluster/test_admission_controller.py:967: in test_queue_reasons_slots
> TIMEOUT_S, config_options={"mt_dop": 4})
> custom_cluster/test_admission_controller.py:277: in 
> _execute_and_collect_profiles
> state = self.wait_for_any_state(handle, expected_states, timeout_s)
> common/impala_test_suite.py:1081: in wait_for_any_state
> actual_state))
> E   Timeout: query 504b3a2511f3cd0e:e27bec6b did not reach one of the 
> expected states [4], last known state 3
> {noformat}
> Those numbers are beeswax QUeryStates:
> {noformat}
> enum QueryState {
>   CREATED = 0
>   INITIALIZED = 1
>   COMPILED = 2
>   RUNNING = 3
>   FINISHED = 4
>   EXCEPTION = 5
> }
> {noformat}
> I.e. it appears to have run for > 60 seconds.
> The timing of that query:
> {noformat}
> I1116 13:25:08.449323 32665 impala-server.cc:1242] 
> 504b3a2511f3cd0e:e27bec6b] Registered query 
> query_id=504b3a2511f3cd0e:e27bec6b 
> session_id=874c2c9eaf2ad730:0004c1bb0ba7b4a7
> I1116 13:25:08.449626 32665 Frontend.java:1532] 
> 504b3a2511f3cd0e:e27bec6b] Analyzing query: select 
> min(ss_wholesale_cost) from tpcds_parquet.store_sales db: default
> ...
> I1116 13:25:08.567667   367 admission-controller.cc:1532] 
> 504b3a2511f3cd0e:e27bec6b] Scheduling query 
> 504b3a2511f3cd0e:e27bec6b with membership version 2
> I1116 13:25:08.567767   367 admission-controller.cc:1590] 
> 504b3a2511f3cd0e:e27bec6b] Scheduling for executor group: 
> default-pool-group1 with 3 executors
> I1116 13:25:08.643026   367 admission-controller.cc:1640] 
> 504b3a2511f3cd0e:e27bec6b] Trying to admit query to pool default-pool 
> in executor group default-pool-group1 (3 executors)
> ...
> I1116 13:25:49.184185 32432 admission-controller.cc:1811] Admitting from 
> queue: query=504b3a2511f3cd0e:e27bec6b
> I1116 13:25:49.184196 32432 admission-controller.cc:1903] For Query 
> 504b3a2511f3cd0e:e27bec6b per_backend_mem_limit set to: -1.00 B 
> per_backend_m
> em_to_admit set to: 114.02 MB coord_backend_mem_limit set to: -1.00 B 
> coord_backend_mem_to_admit set to: 114.02 MB
> I1116 13:25:49.184350   367 admission-controller.cc:1288] 
> 504b3a2511f3cd0e:e27bec6b] Admitted queued query 
> id=504b3a2511f3cd0e:e27bec6b
> I1116 13:25:49.184370   367 admission-controller.cc:1289] 
> 504b3a2511f3cd0e:e27bec6b] Final: agg_num_running=1, 
> agg_num_queued=0, agg_mem_reserved
> =17.25 KB,  local_host(local_mem_admitted=342.05 MB, num_admitted_running=1, 
> num_queued=0, backend_mem_reserved=17.25 KB, topN_query_stats: queries=[0a42
> 32658ea48bc5:c0156469, 5b470ebea7782154:ea22bdb0, 
> 554d88d8f812e22d:efbaf752, aa4a301189a4d144:fdc17ff7], 
> total_mem_consum
> ed=17.25 KB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, 
> min=0, max=17.25 KB, pool_total_mem=17.25 KB, average_per_query=4.31 KB)
> I1116 13:25:49.185214   367 impala-server.cc:2062] 
> 504b3a2511f3cd0e:e27bec6b] Registering query locations
> I1116 13:25:49.185261   367 coordinator.cc:149] 
> 504b3a2511f3cd0e:e27bec6b] Exec() 
> query_id=504b3a2511f3cd0e:e27bec6b stmt=select min(ss_w
> holesale_cost) from tpcds_parquet.store_sales
> I1116 13:25:49.186172   367 coordinator.cc:473] 
> 504b3a2511f3cd0e:e27bec6b] starting execution on 3 backends for 
> query_id=504b3a2511f3cd0e:e27bec6
> b
> I1116 13:25:49.189028 32071 control-service.cc:142] 
> 504b3a2511f3cd0e:e27bec6b] ExecQueryFInstances(): 
> query_id=504b3a2511f3cd0e:e27bec6b 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-018e.vpc.cloudera.com:27000 
> #instances=5
> 

[jira] [Updated] (IMPALA-4741) ORDER BY behavior with UNION is incorrect

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-4741:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> ORDER BY behavior with UNION is incorrect
> -
>
> Key: IMPALA-4741
> URL: https://issues.apache.org/jira/browse/IMPALA-4741
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Greg Rahn
>Priority: Critical
>  Labels: correctness, incompatibility, ramp-up, sql-language, 
> tpc-ds
> Attachments: query36a.sql, query49.sql
>
>
> When a query uses the UNION, EXCEPT, or INTERSECT operators, the ORDER BY 
> clause must be specified at the end of the statement and the results of the 
> combined queries are sorted.  ORDER BY clauses are not allowed in individual 
> branches unless the branch is enclosed by parentheses.
> There are two bugs currently:
> # An ORDER BY is allowed in a branch of a UNION that is not enclosed in 
> parentheses
> # The final ORDER BY of a UNION is attached to the nearest branch when it 
> should be sorting the combined results of the UNION(s)
> For example, this is not valid syntax but is allowed in Impala
> {code}
> select * from t1 order by 1
> union all
> select * from t2
> {code}
> And for queries like this, the ORDER BY should order the unioned result, not 
> just the nearest branch which is the current behavior.
> {code}
> select * from t1
> union all
> select * from t2
> order by 1
> {code}
> If one wants ordering within a branch, the query block must be enclosed by 
> parentheses like such:
> {code}
> (select * from t1 order by 1)
> union all
> (select * from t2 order by 2)
> {code}
> Here is an example where incorrect results are returned.
> Impala
> {code}
> [impalad:21000] > select r_regionkey, r_name from region union all select 
> r_regionkey, r_name from region order by 1 limit 2;
> +-+-+
> | r_regionkey | r_name  |
> +-+-+
> | 0   | AFRICA  |
> | 1   | AMERICA |
> | 2   | ASIA|
> | 3   | EUROPE  |
> | 4   | MIDDLE EAST |
> | 0   | AFRICA  |
> | 1   | AMERICA |
> +-+-+
> Fetched 7 row(s) in 0.12s
> {code}
> PostgreSQL
> {code}
> tpch=# select r_regionkey, r_name from region union all select r_regionkey, 
> r_name from region order by 1 limit 2;
>  r_regionkey |  r_name
> -+---
>0 | AFRICA
>0 | AFRICA
> (2 rows) 
> {code}
> see also https://cloud.google.com/spanner/docs/query-syntax#syntax_5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7083) AnalysisException for GROUP BY and ORDER BY expressions that are folded to constants from 2.9 onwards

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7083:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> AnalysisException for GROUP BY and ORDER BY expressions that are folded to 
> constants from 2.9 onwards
> -
>
> Key: IMPALA-7083
> URL: https://issues.apache.org/jira/browse/IMPALA-7083
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Eric Lin
>Priority: Critical
>  Labels: regression
>
> To reproduce, please run below impala query:
> {code}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (a int);
> SELECT   ( 
> CASE 
>WHEN (1 =1) 
>THEN 1
>ELSE a
> end) AS b
> FROM  test 
> GROUP BY 1 
> ORDER BY ( 
> CASE 
>WHEN (1 =1) 
>THEN 1
>ELSE a
> end);
> {code}
> It will fail with below error:
> {code}
> ERROR: AnalysisException: ORDER BY expression not produced by aggregation 
> output (missing from GROUP BY clause?): (CASE WHEN TRUE THEN 1 ELSE a END)
> {code}
> However, if I replace column name "a" as a constant value, it works:
> {code}
> SELECT   ( 
> CASE 
>WHEN (1 =1) 
>THEN 1
>ELSE 2
> end) AS b
> FROM  test 
> GROUP BY 1 
> ORDER BY ( 
> CASE 
>WHEN (1 =1) 
>THEN 1
>ELSE 2
> end);
> {code}
> This issue is identified in CDH5.12.x (Impala 2.9), and no issues in 5.11.x 
> (Impala 2.8).
> We know that it can be worked around by re-write as below:
> {code}
> SELECT   ( 
> CASE 
>WHEN (1 =1) 
>THEN 1
>ELSE a
> end) AS b
> FROM  test 
> GROUP BY 1 
> ORDER BY 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10924) TestIcebergTable.test_partitioned_insert fails with IOException

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10924:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> TestIcebergTable.test_partitioned_insert fails with IOException
> ---
>
> Key: IMPALA-10924
> URL: https://issues.apache.org/jira/browse/IMPALA-10924
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: broken-build
>
> The test query_test.test_iceberg.TestIcebergTable.test_partitioned_insert 
> fails intermittently with a IOException and stack trace below.
> {noformat}
> uery_test/test_iceberg.py:80: in test_partitioned_insert
> use_db=unique_database)
> common/impala_test_suite.py:682: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:620: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:940: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:367: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:388: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:RuntimeIOException: Failed to write json to file: 
> hdfs://localhost:20500/test-warehouse/test_partitioned_insert_af8be2c3.db/ice_only_part/metadata/2-b8d13a74-4839-4dd3-b74a-6df9436774a2.metadata.json
> E   CAUSED BY: IOException: The stream is closed
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10236) Queries stuck if catalog topic update compression fails

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10236:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Queries stuck if catalog topic update compression fails
> ---
>
> Key: IMPALA-10236
> URL: https://issues.apache.org/jira/browse/IMPALA-10236
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Shant Hovsepian
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: hang, supportability
>
> If a to be compressed Catalog Object doesn't fit into a 2GB buffer, an error 
> is thrown. 
>  
> {code:java}
> /// Compresses a serialized catalog object using LZ4 and stores it back in 
> 'dst'. Stores
> /// the size of the uncompressed catalog object in the first sizeof(uint32_t) 
> bytes of
> /// 'dst'. The compression fails if the uncompressed data size exceeds 
> 0x7E00 bytes.
> Status CompressCatalogObject(const uint8_t* src, uint32_t size, std::string* 
> dst)
> WARN_UNUSED_RESULT;
> {code}
>  
> CatalogServer::AddPendingTopicItem() calls CompressCatalogObject()
>  
> {code:java}
> // Add a catalog update to pending_topic_updates_.
> extern "C"
> JNIEXPORT jboolean JNICALL
> Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* 
> env,
> jclass caller_class, jlong native_catalog_server_ptr, jstring key, jlong 
> version,
> jbyteArray serialized_object, jboolean deleted) {
>   std::string key_string;
>   {
> JniUtfCharGuard key_str;
> if (!JniUtfCharGuard::create(env, key, _str).ok()) {
>   return static_cast(false);
> }
> key_string.assign(key_str.get());
>   }
>   JniScopedArrayCritical obj_buf;
>   if (!JniScopedArrayCritical::Create(env, serialized_object, _buf)) {
> return static_cast(false);
>   }
>   reinterpret_cast(native_catalog_server_ptr)->
>   AddPendingTopicItem(std::move(key_string), version, obj_buf.get(),
>   static_cast(obj_buf.size()), deleted);
>   return static_cast(true);
> }
> {code}
> However the JNI call to AddPendingTopicItem discards the return value.
> Recently the return value was maintained due to IMPALA-10076:
> {code:java}
> -if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, 
> v1Key,
> -obj.catalog_version, data, delete)) {
> +int actualSize = 
> FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
> +v1Key, obj.catalog_version, data, delete);
> +if (actualSize < 0) {
>LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + 
> ", delete="
>+ delete + ", data_size=" + data.length);
> +} else if (summary != null && obj.type == HDFS_PARTITION) {
> +  summary.update(true, delete, obj.hdfs_partition.partition_name,
> +  obj.catalog_version, data.length, actualSize);
>  }
>}
> {code}
> CatalogServiceCatalog::addCatalogObject() now produces an error message but 
> the Catalog update doesn't go through.
> {code:java}
>   if (topicMode_ == TopicMode.FULL || topicMode_ == TopicMode.MIXED) {
> String v1Key = CatalogServiceConstants.CATALOG_TOPIC_V1_PREFIX + key;
> byte[] data = serializer.serialize(obj);
> int actualSize = 
> FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
> v1Key, obj.catalog_version, data, delete);
> if (actualSize < 0) {
>   LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + 
> ", delete="
>   + delete + ", data_size=" + data.length);
> } else if (summary != null && obj.type == HDFS_PARTITION) {
>   summary.update(true, delete, obj.hdfs_partition.partition_name,
>   obj.catalog_version, data.length, actualSize);
> }
>   }
> {code}
> Not sure what the right behavior would be, we could handle the compression 
> issue and try more aggressive compression, or unblock the catalog update.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9375) Remove DirectMetaProvider usage from CatalogMetaProvider

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9375:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Remove DirectMetaProvider usage from CatalogMetaProvider
> 
>
> Key: IMPALA-9375
> URL: https://issues.apache.org/jira/browse/IMPALA-9375
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> I see that CatalogMetaProvider uses {{DirectMetaProvider}} here 
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L239
> There are only a couple of places where it is used within 
> CatalogMetaProvider. We should implement those remaining APIs in catalog-v2 
> mode and remove the usage of DirectMetaProvider from CatalogMetaProvider. 
> DirectMetaProvider starts by default a MetastoreClientPool (with 10 
> connections). This is unnecessary given that catalog already makes the 
> connections to HMS at its startup. It also slows down the coordinator startup 
> time if there are HMS connection issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11737) impala-shell does not work with Python 3.10

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11737:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> impala-shell does not work with Python 3.10
> ---
>
> Key: IMPALA-11737
> URL: https://issues.apache.org/jira/browse/IMPALA-11737
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 4.1.1
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>  Labels: python3
>
> Attempting to install impala-shell under Ubuntu 22.04, which defaults to 
> Python 3.10, runs into an error install sasl==0.2.1
> {code}
> $ pip3 install impala-shell
> Collecting impala-shell
>   Downloading impala_shell-4.1.1.tar.gz (485 kB)
>   485.7/485.7 KB 5.2 MB/s eta 
> 0:00:00
>   Preparing metadata (setup.py) ... done
> Collecting bitarray==2.3.0
>   Downloading bitarray-2.3.0.tar.gz (87 kB)
>   87.1/87.1 KB 27.9 MB/s eta 
> 0:00:00
>   Preparing metadata (setup.py) ... done
> Collecting configparser==4.0.2
>   Downloading configparser-4.0.2-py2.py3-none-any.whl (22 kB)
> Collecting prettytable==0.7.2
>   Downloading prettytable-0.7.2.zip (28 kB)
>   Preparing metadata (setup.py) ... done
> Collecting sasl==0.2.1
>   Using cached sasl-0.2.1.tar.gz (30 kB)
>   Preparing metadata (setup.py) ... done
> Requirement already satisfied: setuptools>=36.8.0 in 
> /usr/lib/python3/dist-packages (from impala-shell) (59.6.0)
> Collecting six==1.14.0
>   Downloading six-1.14.0-py2.py3-none-any.whl (10 kB)
> Collecting sqlparse==0.3.1
>   Downloading sqlparse-0.3.1-py2.py3-none-any.whl (40 kB)
>   40.8/40.8 KB 11.8 MB/s eta 
> 0:00:00
> Collecting thrift==0.11.0
>   Downloading thrift-0.11.0.tar.gz (52 kB)
>   52.5/52.5 KB 9.9 MB/s eta 
> 0:00:00
>   Preparing metadata (setup.py) ... done
> Collecting thrift_sasl==0.4.3
>   Downloading thrift_sasl-0.4.3-py2.py3-none-any.whl (8.3 kB)
> Collecting pure-sasl>=0.6.2
>   Downloading pure-sasl-0.6.2.tar.gz (11 kB)
>   Preparing metadata (setup.py) ... done
> Building wheels for collected packages: impala-shell, bitarray, prettytable, 
> sasl, thrift, pure-sasl
>   Building wheel for impala-shell (setup.py) ... done
>   Created wheel for impala-shell: 
> filename=impala_shell-4.1.1-py3-none-any.whl size=569635 
> sha256=6e1c2a77496b3ff805f94981c5337a31e5c29234b5e6144bccb3e440255e43f4
>   Stored in directory: 
> /root/.cache/pip/wheels/4e/5a/e2/a9b42d2d1e631e017d255589252dfca4f551d82f35c085c66b
>   Building wheel for bitarray (setup.py) ... done
>   Created wheel for bitarray: 
> filename=bitarray-2.3.0-cp310-cp310-linux_x86_64.whl size=180256 
> sha256=3585d775cd448af1d89fe49a11ceb5232785bfe2e6ea8a787020d9bc3c70943f
>   Stored in directory: 
> /root/.cache/pip/wheels/41/86/54/5f5554b3dd06b7be12ae12f9826c8271cc88b16d2a46b689db
>   Building wheel for prettytable (setup.py) ... done
>   Created wheel for prettytable: filename=prettytable-0.7.2-py3-none-any.whl 
> size=13714 
> sha256=21804d294eb39d66ad8974f8a8ac3b761808f3d1991aa3c4239722c52b2add22
>   Stored in directory: 
> /root/.cache/pip/wheels/25/4b/07/18c5d92824315576e478206ea69df34a9e31958f6143eb0e31
>   Building wheel for sasl (setup.py) ... error
>   error: subprocess-exited-with-error
>   
>   × python setup.py bdist_wheel did not run successfully.
>   │ exit code: 1
>   ╰─> [170 lines of output]
>   running bdist_wheel
>   running build
>   running build_py
>   creating build
>   creating build/lib.linux-x86_64-3.10
>   creating build/lib.linux-x86_64-3.10/sasl
>   copying sasl/__init__.py -> build/lib.linux-x86_64-3.10/sasl
>   running egg_info
>   writing sasl.egg-info/PKG-INFO
>   writing dependency_links to sasl.egg-info/dependency_links.txt
>   writing requirements to sasl.egg-info/requires.txt
>   writing top-level names to sasl.egg-info/top_level.txt
>   reading manifest file 'sasl.egg-info/SOURCES.txt'
>   reading manifest template 'MANIFEST.in'
>   adding license file 'LICENSE.txt'
>   writing manifest file 'sasl.egg-info/SOURCES.txt'
>   copying sasl/saslwrapper.cpp -> build/lib.linux-x86_64-3.10/sasl
>   copying sasl/saslwrapper.h -> build/lib.linux-x86_64-3.10/sasl
>   copying sasl/saslwrapper.pyx -> build/lib.linux-x86_64-3.10/sasl
>   running build_ext
>   building 'sasl.saslwrapper' extension
>   creating build/temp.linux-x86_64-3.10
>   creating build/temp.linux-x86_64-3.10/sasl
>   x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g 
> -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat 
> 

[jira] [Updated] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-6294:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Concurrent hung with lots of spilling make slow progress due to blocking in 
> DataStreamRecvr and DataStreamSender
> 
>
> Key: IMPALA-6294
> URL: https://issues.apache.org/jira/browse/IMPALA-6294
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: IMPALA-6285 TPCDS Q3 slow broadcast, 
> slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster 
> queries start running slower, even light weight queries that are not running 
> are affected by this slow down. 
> {code}
>   EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child: 
> 100.00%)
>  - ConvertRowBatchTime: 999.990us
>  - PeakMemoryUsage: 0
>  - RowsReturned: 108.00K (108001)
>  - RowsReturnedRate: 593.00 /sec
> DataStreamReceiver:
>   BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43 
> KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB, 
> 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80 
> MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB, 
> 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91 
> MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
>- BytesReceived: 11.73 MB (12301692)
>- DeserializeRowBatchTimer: 957.990ms
>- FirstBatchArrivalWaitTime: 0.000ns
>- PeakMemoryUsage: 644.44 KB (659904)
>- SendersBlockedTimer: 0.000ns
>- SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
> DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, % 
> non-child: 100.00%)
>- BytesSent: 234.64 MB (246033840)
>- NetworkThroughput(*): 139.58 MB/sec
>- OverallThroughput: 128.92 MB/sec
>- PeakMemoryUsage: 33.12 KB (33920)
>- RowsReturned: 108.00K (108001)
>- SerializeBatchTime: 133.998ms
>- TransmitDataRPCTime: 1s680ms
>- UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client 
> foo-17.domain.com:22000 timed-out during recv call.
> @   0x957a6a  impala::Status::Status()
> @  0x11dd5fe  
> impala::DataStreamSender::Channel::DoTransmitDataRpc()
> @  0x11ddcd4  
> impala::DataStreamSender::Channel::TransmitDataHelper()
> @  0x11de080  impala::DataStreamSender::Channel::TransmitData()
> @  0x11e1004  impala::ThreadPool<>::WorkerThread()
> @   0xd10063  impala::Thread::SuperviseThread()
> @   0xd107a4  boost::detail::thread_data<>::run()
> @  0x128997a  (unknown)
> @ 0x7f68c5bc7e25  start_thread
> @ 0x7f68c58f534d  __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7471) Impala crashes or returns incorrect results when querying parquet nested types

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7471:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Impala crashes or returns incorrect results when querying parquet nested types
> --
>
> Key: IMPALA-7471
> URL: https://issues.apache.org/jira/browse/IMPALA-7471
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: complextype, correctness, crash, parquet
> Attachments: test_users_131786401297925138_0.parquet
>
>
> From 
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-bug-with-nested-arrays-of-structures-where-some-of/m-p/78507/highlight/false#M4779
> {quote}We found a case where Impala returns incorrect values from simple 
> query. Our data contains nested array of structures and structures contains 
> other structures.
> We generated minimal sample data allowing to reproduce the issue.
>  
> SQL to create a table:
> {quote}
> {code}
> CREATE TABLE plat_test.test_users (
>   id INT,
>   name STRING,   
>   devices ARRAY<
> STRUCT<
>   id:STRING,
>   device_info:STRUCT<
> model:STRING
>   >
> >
>   >
> )
> STORED AS PARQUET
> {code}
> {quote}
> Please put attached parquet file to the location of the table and refresh the 
> table.
> In sample data we have 2 users, one with 2 devices, second one with 3. Some 
> of the devices.device_info.model fields are NULL.
>  
> When I issue a query:
> {quote}
> {code}
> SELECT u.name, d.device_info.model as model
> FROM test_users u,
> u.devices d;
> {code}
>  {quote}
> I'm expecting to get 5 records in results, but getting only one1.png
> If I change query to:
>  {quote}
> {code}
> SELECT u.name, d.device_info.model as model
> FROM test_users u
> LEFT OUTER JOIN u.devices d;
>  {code}
> {quote}
> I'm getting two records in the results, but still not as it should be.
> We found some workaround to this problem. If we add to the result columns 
> device.id we will get all records from parquet file:
> {quote}
> {code}
> SELECT u.name, d.id, d.device_info.model as model
> FROM test_users u
> , u.devices d
>  {code}
> {quote}
> And result is 3.png
>  
> But we can't rely on this workaround, because we don't need device.id in all 
> queries and Impala optimizes it, and as a result we are getting unpredicted 
> results.
>  
> I tested Hive query on this table and it returns expected results:
> {quote}
> {code}
> SELECT u.name, d.device_info.model
> FROM test_users u
> lateral view outer inline (u.devices) d;
>  {code}
> {quote}
> results:
> 4.png
> Please advice if it's a problem in Impala engine or we did some mistake in 
> our query.
>  
> Best regards,
> Come2Play team.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9684) Update query option levels

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9684:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Update query option levels
> --
>
> Key: IMPALA-9684
> URL: https://issues.apache.org/jira/browse/IMPALA-9684
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Critical
>
> We've accumulated a lot of options and many of them are labelled as more 
> important than they really are.
> * The basic level should only include options that change user-facing 
> behaviour. Query tuning, e.g. anything related to runtime filters, should be 
> in advanced.
> * "Chicken bits" for stable features should be moved to DEPRECATED. E.g. 
> PARQUET_READ_PAGE_INDEX
> * Tuning options that are not commonly used should be moved to deprecated or 
> development (depending on whether we might want to remove it, or whether we 
> might retain it for new features). E.g. PREFETCH_MODE
> {noformat}
> Query options (defaults shown in []):
>   ABORT_ON_ERROR: [0]
>   COMPRESSION_CODEC: []
>   DEFAULT_FILE_FORMAT: [TEXT]
>   DEFAULT_HINTS_INSERT_STATEMENT: []
>   DISABLE_CODEGEN: [0]
>   DISABLE_HDFS_NUM_ROWS_ESTIMATE: [0]
>   DISABLE_ROW_RUNTIME_FILTERING: [0]
>   DISABLE_STREAMING_PREAGGREGATIONS: [0]
>   DISABLE_UNSAFE_SPILLS: [0]
>   EXEC_TIME_LIMIT_S: [0]
>   EXPLAIN_LEVEL: [STANDARD]
>   IDLE_SESSION_TIMEOUT: [0]
>   MAX_ROW_SIZE: [524288]
>   MAX_STATEMENT_LENGTH_BYTES: [16777216]
>   MEM_LIMIT: [0]
>   MT_DOP: []
>   NUM_SCANNER_THREADS: [0]
>   OPTIMIZE_PARTITION_KEY_SCANS: [0]
>   PARQUET_ARRAY_RESOLUTION: [THREE_LEVEL]
>   PARQUET_FALLBACK_SCHEMA_RESOLUTION: [POSITION]
>   QUERY_TIMEOUT_S: [0]
>   REQUEST_POOL: []
>   RUNTIME_FILTER_MODE: [GLOBAL]
>   RUNTIME_FILTER_WAIT_TIME_MS: [0]
>   S3_SKIP_INSERT_STAGING: [1]
>   SCRATCH_LIMIT: [-1]
>   STATEMENT_EXPRESSION_LIMIT: [25]
>   SYNC_DDL: [0]
>   THREAD_RESERVATION_AGGREGATE_LIMIT: [0]
>   THREAD_RESERVATION_LIMIT: [3000]
>   TIMEZONE: [America/Los_Angeles]
> Advanced Query Options:
>   APPX_COUNT_DISTINCT: [0]
>   BROADCAST_BYTES_LIMIT: [34359738368]
>   BUFFER_POOL_LIMIT: []
>   CLIENT_IDENTIFIER: Impala Shell v4.0.0-SNAPSHOT (494418c) built on Mon 
> Apr 20 22:53:57 PDT 2020
>   COMPUTE_STATS_MIN_SAMPLE_SIZE: [1073741824]
>   DEFAULT_JOIN_DISTRIBUTION_MODE: [BROADCAST]
>   DEFAULT_SPILLABLE_BUFFER_SIZE: [2097152]
>   DEFAULT_TRANSACTIONAL_TYPE: [NONE]
>   DISABLE_CODEGEN_ROWS_THRESHOLD: [5]
>   DISABLE_DATA_CACHE: [0]
>   DISABLE_HBASE_NUM_ROWS_ESTIMATE: [0]
>   ENABLE_CNF_REWRITES: [0]
>   ENABLE_EXPR_REWRITES: [1]
>   EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
>   FETCH_ROWS_TIMEOUT_MS: [1]
>   HBASE_CACHE_BLOCKS: [0]
>   HBASE_CACHING: [0]
>   KUDU_READ_MODE: [DEFAULT]
>   KUDU_SNAPSHOT_READ_TIMESTAMP_MICROS: [0]
>   MAX_CNF_EXPRS: [0]
>   MAX_ERRORS: [100]
>   MAX_MEM_ESTIMATE_FOR_ADMISSION: [0]
>   MAX_NUM_RUNTIME_FILTERS: [10]
>   MIN_SPILLABLE_BUFFER_SIZE: [65536]
>   NUM_REMOTE_EXECUTOR_CANDIDATES: [3]
>   NUM_ROWS_PRODUCED_LIMIT: [0]
>   PARQUET_ANNOTATE_STRINGS_UTF8: [0]
>   PARQUET_DICTIONARY_FILTERING: [1]
>   PARQUET_FILE_SIZE: [0]
>   PARQUET_OBJECT_STORE_SPLIT_SIZE: [268435456]
>   PARQUET_PAGE_ROW_COUNT_LIMIT: []
>   PARQUET_READ_PAGE_INDEX: [1]
>   PARQUET_READ_STATISTICS: [1]
>   PARQUET_WRITE_PAGE_INDEX: [1]
>   PREAGG_BYTES_LIMIT: [-1]
>   PREFETCH_MODE: [HT_BUCKET]
>   REPLICA_PREFERENCE: [CACHE_LOCAL]
>   RESOURCE_TRACE_RATIO: [0.00]
>   RUNTIME_BLOOM_FILTER_SIZE: [1048576]
>   RUNTIME_FILTER_MAX_SIZE: [16777216]
>   RUNTIME_FILTER_MIN_SIZE: [1048576]
>   SCAN_BYTES_LIMIT: [0]
>   SCHEDULE_RANDOM_REPLICA: [0]
>   SHUFFLE_DISTINCT_EXPRS: [1]
>   SUPPORT_START_OVER: [false]
>   TOPN_BYTES_LIMIT: [536870912]
> Shell Options
>   WRITE_DELIMITED: False
>   VERBOSE: True
>   LIVE_SUMMARY: False
>   OUTPUT_FILE: None
>   DELIMITER: \t
>   LIVE_PROGRESS: False
> Variables:
>   No variables defined.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2422) % escaping does not work correctly in a LIKE clause

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-2422:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> % escaping does not work correctly in a LIKE clause
> ---
>
> Key: IMPALA-2422
> URL: https://issues.apache.org/jira/browse/IMPALA-2422
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, 
> Impala 2.6.0, Impala 2.7.0
>Reporter: Huaisi Xu
>Priority: Critical
>  Labels: correctness, downgraded, incompatibility
>
> {code:java}
> [localhost:21000] > select '%' like "\%";
> Query: select '%' like "\%"
> +---+
> | '%' like '\%' |
> +---+
> | false   |   -> should return true.
> +---+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10040) Crash on UnionNode when codegen is disabled

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10040:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Crash on UnionNode when codegen is disabled
> ---
>
> Key: IMPALA-10040
> URL: https://issues.apache.org/jira/browse/IMPALA-10040
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: crash
>
> Saw a crash when ran a UNION query with codegen disabled:
> {code}
> F0803 15:37:44.551749 24805 union-node-ir.cc:26] 
> fd41196430b5c449:0a195a250006] Check failed: !dst_batch->AtCapacity() 
> *** Check failure stack trace: *** 
> @  0x514aa8c  google::LogMessage::Fail()
> @  0x514c37c  google::LogMessage::SendToLog()
> @  0x514a3ea  google::LogMessage::Flush()
> @  0x514dfe8  google::LogMessageFatal::~LogMessageFatal()
> @  0x286c323  impala::UnionNode::MaterializeExprs()
> @  0x286c983  impala::UnionNode::MaterializeBatch()
> @  0x286798a  impala::UnionNode::GetNextMaterialized()
> @  0x2868ac4  impala::UnionNode::GetNext()
> @  0x225f77c  impala::FragmentInstanceState::ExecInternal()
> @  0x225be20  impala::FragmentInstanceState::Exec()
> @  0x2285c35  impala::QueryState::ExecFInstance()
> @  0x2284037  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x22877d6  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x2053061  boost::function0<>::operator()()
> @  0x2676bff  impala::Thread::SuperviseThread()
> @  0x267eb9c  boost::_bi::list5<>::operator()<>()
> @  0x267eac0  boost::_bi::bind_t<>::operator()()
> @  0x267ea81  boost::detail::thread_data<>::run()
> @  0x3e514e1  thread_proxy
> @ 0x7f6575c326b9  start_thread
> @ 0x7f65727fe4dc  clone
> {code}
> The query is
> {code}
> I0803 15:37:44.273838 24616 Frontend.java:1508] 
> fd41196430b5c449:0a195a25] Analyzing query: create table my_bigstrs 
> stored as parquet as
> select *, repeat(string_col, 10) as bigstr
> from functional.alltypes
> order by id
> limit 10
> union all
> select *, repeat(string_col, 1000) as bigstr
> from functional.alltypes
> order by id
> limit 10 db: default
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8521) Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8521:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client
> 
>
> Key: IMPALA-8521
> URL: https://issues.apache.org/jira/browse/IMPALA-8521
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Critical
>
> I'm looking at some job logs and seeing a bunch of errors like this. I don't 
> know if it's benign or if it's something more serious.
> {noformat}
> I0507 07:34:53.934693 20195 scan-range.cc:607] 
> dd4d6eb8d2ad9587:6b44fe1b0002] Cache read failed for scan range: 
> file=hdfs://localhost:20500/test-warehouse/f861f1a3/nation.tbl disk_id=0 
> offset=1024  exclusive_hdfs_fh=0xec09220 num_remote_bytes=0 cancel_status= 
> buffer_queue=0 num_buffers_in_readers=0 unused_iomgr_buffers=0 
> unused_iomgr_buffer_bytes=0 blocked_on_buffer=0. Switching to disk read path.
> W0507 07:34:53.934787 20195 DFSInputStream.java:668] 
> dd4d6eb8d2ad9587:6b44fe1b0002] closing file 
> /test-warehouse/f861f1a3/nation.tbl, but there are still unreleased 
> ByteBuffers allocated by read().  Please release 
> java.nio.DirectByteBufferR[pos=1024 lim=2048 cap=2199].
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10966) query_test.test_scanners.TestIceberg.test_iceberg_query multiple failures in an ASAN run

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10966:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> query_test.test_scanners.TestIceberg.test_iceberg_query multiple failures in 
> an ASAN run
> 
>
> Key: IMPALA-10966
> URL: https://issues.apache.org/jira/browse/IMPALA-10966
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Laszlo Gaal
>Priority: Critical
>  Labels: broken-build, iceberg
>
> the actual failures look pretty similar.
> Pattern 1:
> {code}
> query_test/test_scanners.py:357: in test_iceberg_query 
> self.run_test_case('QueryTest/iceberg-query', vector) 
> common/impala_test_suite.py:713: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:549: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/4-4-0982a5d3-48c0-4dd0-ab87-d24190894251-0.orc',regex:.*,''
>  == 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/4-4-0982a5d3-48c0-4dd0-ab87-d24190894251-0.orc','460B',''
>  E 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/00014-14-dc56d2c8-e285-428d-b81e-f3d07ec53c12-0.orc',regex:.*,''
>  == 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/00014-14-dc56d2c8-e285-428d-b81e-f3d07ec53c12-0.orc','460B',''
> [. matching result lines elided.]
> E 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/version-hint.text',regex:.*,''
>  != 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/v3.metadata.json','2.21KB',''
>  
> E None != 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/v4.metadata.json','2.44KB',''
>  
> E None != 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/v5.metadata.json','2.66KB',''
>  
> E None != 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/version-hint.text','1B',''
>  
> E Number of rows returned (expected vs actual): 25 != 28
> {code}
> Pattern 2:
> {code}
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:713: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:549: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/4-4-0982a5d3-48c0-4dd0-ab87-d24190894251-0.orc',regex:.*,''
>  == 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/data/action=click/4-4-0982a5d3-48c0-4dd0-ab87-d24190894251-0.orc','460B',''
> [.matching result lines elided...]
> E 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/iceberg_partitioned_orc/functional_parquet/iceberg_partitioned_orc/metadata/version-hint.text',regex:.*,''
>  != 
> 

[jira] [Updated] (IMPALA-11424) Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11424:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Support pushdown non-equi join predicate from OUTER/INNER JOIN  to SCANNODE
> ---
>
> Key: IMPALA-11424
> URL: https://issues.apache.org/jira/browse/IMPALA-11424
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> In order to reduce the amount of data read and transmitted, the 
> non-equivalent condition of Join can be pushed to SCAN_NODE.
> For example:
> {code:java}
> // code placeholder
> select count(1) from ( select ss.ss_ticket_numberFROM store_sales  ss LEFT 
> OUTER JOIN store_returns sr ON (sr.sr_item_sk = ss.ss_item_sk  AND 
> sr.sr_ticket_number >= ss.ss_ticket_number) where ss.ss_sold_date_sk = 
> 2450816) t where t.ss_ticket_number = 79577; {code}
> Current plan:
> {code:java}
> // code placeholder
> PLAN-ROOT SINK
> |
> 07:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  row-size=8B cardinality=1
> |
> 06:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE
> |  output: count(*)
> |  row-size=8B cardinality=1
> |
> 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
> |  hash predicates: sr.sr_item_sk = ss.ss_item_sk
> |  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
> |  runtime filters: RF000 <- ss.ss_item_sk
> |  row-size=32B cardinality=16
> |
> |--05:EXCHANGE [HASH(ss.ss_item_sk)]
> |  |
> |  00:SCAN HDFS [tpcds_parquet.store_sales ss]
> |     partition predicates: ss.ss_sold_date_sk = 2450816
> |     partitions=1/1824 files=1 size=70.77KB
> |     predicates: ss.ss_ticket_number = 79577
> |     row-size=16B cardinality=1
> |
> 04:EXCHANGE [HASH(sr.sr_item_sk)]
> |
> 01:SCAN HDFS [tpcds_parquet.store_returns sr]
>    partitions=1/1 files=1 size=15.42MB
>    runtime filters: RF000 -> sr.sr_item_sk
>    row-size=16B cardinality=287.51K{code}
> After Pushdown:
> {code:java}
> // code placeholder
> PLAN-ROOT SINK
> |
> 07:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  row-size=8B cardinality=1
> |
> 06:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE
> |  output: count(*)
> |  row-size=8B cardinality=1
> |
> 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
> |  hash predicates: sr.sr_item_sk = ss.ss_item_sk
> |  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
> |  runtime filters: RF000 <- ss.ss_item_sk
> |  row-size=32B cardinality=16
> |
> |--05:EXCHANGE [HASH(ss.ss_item_sk)]
> |  |
> |  00:SCAN HDFS [tpcds_parquet.store_sales ss]
> |     partition predicates: ss.ss_sold_date_sk = 2450816
> |     partitions=1/1824 files=1 size=70.77KB
> |     predicates: ss.ss_ticket_number = 79577
> |     row-size=16B cardinality=1
> |
> 04:EXCHANGE [HASH(sr.sr_item_sk)]
> |
> 01:SCAN HDFS [tpcds_parquet.store_returns sr]
>    partitions=1/1 files=1 size=15.42MB
>    predicates: sr.sr_ticket_number >= 79577
>    runtime filters: RF000 -> sr.sr_item_sk
>    row-size=16B cardinality=28.75K {code}
>  
> For pushdown of Join non-equi conjuncts, the current qualifications:
> 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
> 2. Only valid for non-equi predicates containing literalExpr,
> for example: slot >= Literal, slot in Literal list;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11696) Incorrect warnings when creating text/sequence table with row format delimiters

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11696:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Incorrect warnings when creating text/sequence table with row format 
> delimiters
> ---
>
> Key: IMPALA-11696
> URL: https://issues.apache.org/jira/browse/IMPALA-11696
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.1.0, Impala 4.1.1
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> IMPALA-9822 adds a warning when the customized row format delimiters in the 
> CreateTable statement is ignored. However, we see the same warning when 
> creating TEXT or SEQUENCE_FILE tables.
> {code:sql}
> [localhost:21050] default> create external table my_csv (a int, b string) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
> Query: create external table my_csv (a int, b string) ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',' stored as textfile
> +-+
> | summary                 |
> +-+
> | Table has been created. |
> +-+
> WARNINGS: 'ROW FORMAT DELIMITED FIELDS TERMINATED BY ','' is ignored.
> Fetched 1 row(s) in 0.11s {code}
> The following code is buggy:
> {code:java}
> if (getRowFormat() != null) {
>   String fieldDelimiter = getRowFormat().getFieldDelimiter();
>   String lineDelimiter = getRowFormat().getLineDelimiter();
>   String escapeChar = getRowFormat().getEscapeChar();
>   if (getFileFormat() != THdfsFileFormat.TEXT
>   || getFileFormat() != THdfsFileFormat.SEQUENCE_FILE) {  // Should 
> use && instead
> if (fieldDelimiter != null) {
>   analyzer.addWarning("'ROW FORMAT DELIMITED FIELDS TERMINATED BY '"
>   + fieldDelimiter + "'' is ignored.");
> }
> if (lineDelimiter != null) {
>   analyzer.addWarning("'ROW FORMAT DELIMITED LINES TERMINATED BY '"
>   + lineDelimiter + "'' is ignored.");
> }
> if (escapeChar != null) {
>   analyzer.addWarning(
>   "'ROW FORMAT DELIMITED ESCAPED BY '" + escapeChar + "'' is 
> ignored.");
> }
>   }
> }
> {code}
> https://github.com/apache/impala/blob/8271bdd587d241cd5a61ccae7422bbb5fafcfaf7/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java#L276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8462) Get exhaustive tests passing with dockerised minicluster

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8462:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Get exhaustive tests passing with dockerised minicluster
> 
>
> Key: IMPALA-8462
> URL: https://issues.apache.org/jira/browse/IMPALA-8462
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11606) add 'untracked memory' metric

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11606:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> add 'untracked memory' metric
> -
>
> Key: IMPALA-11606
> URL: https://issues.apache.org/jira/browse/IMPALA-11606
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: yx91490
>Assignee: yx91490
>Priority: Major
>
> Add a gauge metric 'untracked memroy' to record the memory size that not 
> tracked by mem-tracker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6783) Rethink the end-to-end queuing at KrpcDataStreamReceiver

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-6783:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Rethink the end-to-end queuing at KrpcDataStreamReceiver
> 
>
> Key: IMPALA-6783
> URL: https://issues.apache.org/jira/browse/IMPALA-6783
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Michael Ho
>Priority: Major
>
> Follow up from IMPALA-6116. We currently bound the memory usage of service 
> queue and force a RPC to retry if the memory usage exceeds the configured 
> limit. The deserialization of row batches happen in the context of service 
> threads. The deserialized row batches are stored in a queue in the receiver 
> and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes. 
> Exceeding that limit, we will put incoming row batches into a deferred RPC 
> queue, which will be drained by deserialization threads. This makes it hard 
> to size the service queues as its capacity may need to grow as the number of 
> nodes in the cluster grows.
> We may need to reconsider the role of service queue: it could just be a 
> transition queue before KrpcDataStreamMgr routes the incoming row batches to 
> the appropriate receivers. The actual queuing may happen in the receiver. The 
> deserialization should always happen in the context of deserialization 
> threads so the service threads will just be responsible for routing the RPC 
> requests. This allows us to keep a rather small service queue. Incoming 
> serialized row batches will always sit in a queue to be drained by 
> deserialization threads. We may still need to keep a certain number of 
> deserialized row batches around ready to be consumed. In this way, we can 
> account for the memory consumption and size the queue based on number of 
> senders and memory budget of a query.
> One hurdle is that we need to overcome the undesirable cross-thread 
> allocation pattern as rpc_context is allocated from service threads but freed 
> by the deserialization thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8123) Add tool to plot resource usage

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8123:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Add tool to plot resource usage
> ---
>
> Key: IMPALA-8123
> URL: https://issues.apache.org/jira/browse/IMPALA-8123
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
>  Labels: observability, supportability
>
> IMPALA-7694 adds resource usage time series to the thrift profile. In a 
> subsequent change we should add a tool that makes plotting the data easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9400) Impala Ozone Support

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9400:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Impala Ozone Support
> 
>
> Key: IMPALA-9400
> URL: https://issues.apache.org/jira/browse/IMPALA-9400
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Sahil Takiar
>Assignee: Michael Smith
>Priority: Major
>  Labels: ozone
>
> Impala should be able to read/write data from Apache Ozone: 
> [https://hadoop.apache.org/ozone/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6194) Ensure all fragment instances notice cancellation

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-6194:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Ensure all fragment instances notice cancellation
> -
>
> Key: IMPALA-6194
> URL: https://issues.apache.org/jira/browse/IMPALA-6194
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Lars Volker
>Priority: Major
>  Labels: observability, supportability
>
> Currently queries can get stuck in an uncancellable state, e.g. when blocking 
> on function calls or condition variables without periodically checking for 
> cancellation. We should eliminate all those calls and make sure we don't 
> re-introduce such issues. One option would be a watchdog to check that each 
> fragment instance regularly calls RETURN_IF_CANCEL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7672) Play nice with load balancers when shutting down coordinator

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7672:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Play nice with load balancers when shutting down coordinator
> 
>
> Key: IMPALA-7672
> URL: https://issues.apache.org/jira/browse/IMPALA-7672
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> This is a placeholder to figure out what we need to do to get load balancers 
> like HAProxy and F5 to cleanly switch to alternative coordinators when we do 
> a graceful shutdown. E.g. do we need to stop accepting new TCP connections?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9846) Switch to aggregated runtime profile representation

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9846:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Switch to aggregated runtime profile representation
> ---
>
> Key: IMPALA-9846
> URL: https://issues.apache.org/jira/browse/IMPALA-9846
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> We need to ensure that the aggregated profile is an adequate replacement, 
> then switch over the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8918) Drop-Table-If-Exists can't drop unloaded table

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8918:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Drop-Table-If-Exists can't drop unloaded table
> --
>
> Key: IMPALA-8918
> URL: https://issues.apache.org/jira/browse/IMPALA-8918
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Quanlong Huang
>Priority: Major
>
> "unloaded" in the title means Impala does know the table name.
> If a table is created in Hive and haven't been run INVALIDATE METADATA in 
> Impala, Drop-Table-If-Exists on it will always return results as "Table does 
> not exist."
> Repro:
> {code:java}
> $ beeline -u "jdbc:hive2://localhost:11050/default;" -e "create table abcd (i 
> int)"
> $ bin/impala-shell.sh
> [localhost:21000] default> drop table if exists abcd;
> Query: drop table if exists abcd
> +---+
> | summary               |
> +---+
> | Table does not exist. |
> +---+
> Fetched 1 row(s) in 0.04s
> [localhost:21000] default> invalidate metadata abcd;
> Query: invalidate metadata abcd
> Query submitted at: 2019-09-04 06:03:19 (Coordinator: 
> http://quanlong-OptiPlex-7060:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-7060:25000/query_plan?query_id=c34120fc1891edae:38a6bf07
> Fetched 0 row(s) in 0.05s
> [localhost:21000] default> drop table if exists abcd;                         
>                                                                               
>                                                   
> Query: drop table if exists abcd
> +-+
> | summary                 |
> +-+
> | Table has been dropped. |
> +-+
> Fetched 1 row(s) in 3.91s {code}
> Drop-Table-If-Exists should be able to drop such kind of a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7969) Always admit trivial queries immediately

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7969:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Always admit trivial queries immediately
> 
>
> Key: IMPALA-7969
> URL: https://issues.apache.org/jira/browse/IMPALA-7969
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Yida Wu
>Priority: Major
>  Labels: admission-control
>
> Here are two common query types that consume minimal resources:
> * {{select ... from ... limit 0}}, which is used by some clients to determine 
> column types
> * {{select , , }}, which just evaluates some constant 
> expressions on the coordinator
> Currently these queries get queued if there are existing queued queries or 
> the number of queries limit is exceeded, which is inconvenient for use cases 
> where latency is important. I think the planner should identify trivial 
> queries and admission controller should admit immediately.
> Here's an initial thought on the definition of a trivial query:
> * Must have PLAN ROOT SINK as the root
> * Can contain UNION and EMPTYSET nodes only



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10789) Early materialize expressions in ScanNode

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10789:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Early materialize expressions in ScanNode
> -
>
> Key: IMPALA-10789
> URL: https://issues.apache.org/jira/browse/IMPALA-10789
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
>
> Impala uses the Late Materialize, to calculate expressions, for example,
> {code:java}
> SELECT SUM(col), COUNT(col), MIN(col), MAX(col)
> FROM (
>   SELECT CAST(regexp_extract(string_col, '(\\d+)', 0) AS bigint) col
>   FROM functional_parquet.alltypesagg
> ) t{code}
> The Plan likes
> {code:java}
> PLAN-ROOT SINK
> |
> 03:AGGREGATE [FINALIZE]
> |  output: sum:merge(col), count:merge(col), min:merge(col), max:merge(col)
> |  row-size=32B cardinality=1
> |
> 02:EXCHANGE [UNPARTITIONED]
> |
> 01:AGGREGATE
> |  output: sum(CAST(regexp_extract(string_col, '(\\d+)', 0) AS BIGINT)), 
> count(CAST(regexp_extract(string_col, '(\\d+)', 0) AS BIGINT)), 
> min(CAST(regexp_extract(string_col, '(\\d+)', 0) AS BIGINT)), 
> max(CAST(regexp_extract(string_col, '(\\d+)', 0) AS BIGINT))
> |  row-size=32B cardinality=1
> |
> 00:SCAN HDFS [functional_parquet.alltypesagg]
>partitions=11/11 files=11 size=464.70KB
>row-size=15B cardinality=11.00K
> {code}
> In the aggregation phase, the expression in the parameters of the aggregation 
> function is evaluated. In this way, the same expression that appears in 
> multiple aggregation functions needs to be evaluated multiple times, which 
> leads to long time consuming, especially for complex expressions, such as 
> regular expressions.
> For analytic functions and contains union all, 
>  
> {code:java}
> SELECT SUM(int_col) OVER (PARTITION BY id )
> FROM (
> SELECT id
> , CASE
> WHEN id = 10 THEN tinyint_col
> WHEN string_col LIKE '%6%' THEN smallint_col
> END AS int_col
> FROM functional_parquet.alltypesagg
> UNION ALL
> SELECT id
> , CASE
> WHEN id = 10 THEN tinyint_col
> WHEN string_col LIKE '%6%' THEN smallint_col
> END AS int_col
> FROM functional_parquet.alltypes
> ) t
> {code}
> The plan likes
> {code:java}
> PLAN-ROOT SINK
> |
> 06:EXCHANGE [UNPARTITIONED]
> |
> 04:ANALYTIC
> |  functions: sum(int_col)
> |  partition by: id
> |  row-size=14B cardinality=18.30K
> |
> 03:SORT
> |  order by: id ASC NULLS FIRST
> |  row-size=6B cardinality=18.30K
> |
> 05:EXCHANGE [HASH(id)]
> |
> 00:UNION
> |  row-size=6B cardinality=18.30K
> |
> |--02:SCAN HDFS [functional_parquet.alltypes]
> | partitions=24/24 files=24 size=189.91KB
> | row-size=22B cardinality=7.30K
> |
> 01:SCAN HDFS [functional_parquet.alltypesagg]
>partitions=11/11 files=11 size=464.70KB
>row-size=24B cardinality=11.00K{code}
> In UnionNode, it will materialize expressions and prune columns.
> Currently UnionNode is single-threaded and ScanNode supports multi-threading, 
> it will  improve query performance if materialize expressions in ScanNode.
> We can specify which expressions require early materialize by hints, and 
> Impala internally determines if the expression can be evaluated in ScanNode.
> {code:java}
> SELECT SUM(col), COUNT(col), MIN(col), MAX(col)
> FROM (
>   SELECT CAST(regexp_extract(string_col, '(\\d+)', 0) AS bigint) 
> col/*+materialize_expr*/
>   FROM functional_parquet.alltypesagg
> ) t
> {code}
> This can materialize in ScanNode, but like the follow can't use early 
> matrialize
> {code:java}
> SELECT SUM(col)
> FROM (
>   SELECT CASE 
>   WHEN t1.id = 10 THEN t2.tinyint_col
>   ELSE t2.smallint_col
>   END AS col/*+materialize_expr*/
>   FROM functional_parquet.alltypesagg t1
>   JOIN functional_parquet.alltypes t2 ON t1.id = t2.id
> ) t
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8624) Collect coredumps, minidumps and hserr*.log files from dockerised tests

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8624:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Collect coredumps, minidumps and hserr*.log files from dockerised tests
> ---
>
> Key: IMPALA-8624
> URL: https://issues.apache.org/jira/browse/IMPALA-8624
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8804:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Memory based admission control is always disabled if pool max mem is not set
> 
>
> Key: IMPALA-8804
> URL: https://issues.apache.org/jira/browse/IMPALA-8804
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Major
>  Labels: admission-control
> Attachments: minicluster-fair-scheduler.xml, 
> minicluster-llama-site.xml
>
>
> Memory-based admission control doesn't kick in with the provided config files 
> where no max memory is configured for the pool. This is the documented 
> behaviour and not a bug - see 
> https://impala.apache.org/docs/build/html/topics/impala_admission.html. 
> However, it is inconvenient since you need to specify some max memory value 
> even if you don't want to limit the pool's share of the clusters resources 
> (or the cluster is variable in size).
> This is unfriendly. It is also confusing since there is no explicit way to 
> enable memory-based admission control.
> You can workaround by setting the pool max memory to a very high value. 
> To reproduce, start a minicluster with the provided configs. If you submit 
> multiple memory-intensive queries in parallel, they will never be queued.
> {noformat}
> start-impala-cluster.py 
> --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml
>  -llama_site_path=minicluster-llama-site.xml" 
> --impalad_args=-vmodule=admission-controller
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9654) Intra-node execution skew increase with mt_dop

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9654:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Intra-node execution skew increase with mt_dop
> --
>
> Key: IMPALA-9654
> URL: https://issues.apache.org/jira/browse/IMPALA-9654
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: multithreading, performance
>
> We've seen significant amounts of execution skew (big gap between avg and max 
> execution time for a scan node) with multithreading enabled on TPC-DS 
> queries. We balance bytes well, but bytes of input files are often not 
> correlated with the amount of work in the scan, or above the scan. Some 
> causes are:
> * Dynamic partition pruning leading to different instance with variable 
> numbers of input splits
> * Different amounts of rows being filtered out by predicates and row filters, 
> leading to skew in rows returned from the plan.
> * Different amounts of compressibility
> * Files being written in different ways, e.g. different schema, different 
> writer.
> More dynamic load balancing can address all of this if scans pick up the next 
> range when its pipeline has finished processing the rows from the previous 
> range. I.e. with the threading model we can deal with time skew anywhere in 
> the pipeline by balancing in the scan.
> I *think* we can solve this for HDFS scans by lifting the ReaderContext up to 
> the FragmentState (one per plan node) and making corresponding changes to the 
> scan implementation. We would need to add a bit more machinery to support 
> Kudu and HBase scans but I think a similar approach would work conceptually.
> A more invasive (and probably expensive) solution is to do a local exchange 
> above the scan node, e.g. a multi-producer multi-consumer queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7732) Check / Implement resource limits documented in IMPALA-5605

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7732:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Check / Implement resource limits documented in IMPALA-5605
> ---
>
> Key: IMPALA-7732
> URL: https://issues.apache.org/jira/browse/IMPALA-7732
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Major
>
> IMPALA-5605 documents a list of recommended bump in system resource limits 
> which may be necessary when running Impala at scale. We may consider checking 
> those limits at startup with {{getrlimit()}} and potentially setting them 
> with {{setrlimit()}} if possible. At the minimum, may be helpful to log a 
> warning message if the limit is below certain threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9951) Skew in analytic sorts when partition key has low cardinality

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9951:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Skew in analytic sorts when partition key has low cardinality
> -
>
> Key: IMPALA-9951
> URL: https://issues.apache.org/jira/browse/IMPALA-9951
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: multithreading, tpcds
>
> In queries like TPC-DS Q67, the cardinality of the PARTITION BY expression of 
> the analytic may be much lower than the parallelism of the input fragment. In 
> this case the runtime of the sort can be skewed. We could mitigate the 
> problem by doing the expensive sort *before* the exchange, so that the 
> analytic fragment only needs to merge together its sorted input and evaluate 
> the analytic over it.
> The impact of this is greater with multithreading, so I am considering only 
> change the default when mt_dop > 0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6876) Entries in CatalogUsageMonitor are not cleared after invalidation

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-6876:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Entries in CatalogUsageMonitor are not cleared after invalidation
> -
>
> Key: IMPALA-6876
> URL: https://issues.apache.org/jira/browse/IMPALA-6876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: memory-leak, ramp-up
>
> The CatalogUsageMonitor in the catalog maintains a small cache of references 
> to tables that: a) are accessed frequently in the catalog and b) have the 
> highest memory requirements. These entries are not cleared upon server or 
> table invalidation, thus preventing the GC from collecting the memory of 
> these tables. We should make sure that the CatalogUsageMonitor does not 
> maintain entries of tables that have been invalidated or deleted. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11417) support outer join elimination

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11417:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> support outer join elimination
> --
>
> Key: IMPALA-11417
> URL: https://issues.apache.org/jira/browse/IMPALA-11417
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>
> We are facing a scenario where two tables are outer joined but only fields 
> from the outer side table are used and the join key of the inner side table 
> is guaranteed to be unique. Take the following simplified query as an 
> example, where s.id is guaranteed to be unique:
> {code:sql}
> -- drop the test tables if exists:
> drop table if exists t;
> drop table if exists s;
> -- create test tables:
> create table t (s_id bigint, value bigint);
> create table s(id bigint, value bigint, primary key(id));
> -- the test SQL:
> select t.* from t left join s on t.s_id = s.id;
> {code} 
> the above query can be optimized to the following if we can utilize the 
> primary key information:
> {code:sql}
> select t.* from t;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11734) TestIcebergTable.test_compute_stats fails in RELEASE builds

2022-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637722#comment-17637722
 ] 

ASF subversion and git services commented on IMPALA-11734:
--

Commit 08a04d2495c2fb0968917ff29c37b08b7bea0a1f in impala's branch 
refs/heads/branch-4.2.0 from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08a04d249 ]

IMPALA-11734: TestIcebergTable.test_compute_stats fails in RELEASE builds

If the Impala version is set to a release build as described in point 8
in the "How to Release" document
(https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate),
TestIcebergTable.test_compute_stats fails:

Stacktrace
query_test/test_iceberg.py:852: in test_compute_stats
self.run_test_case('QueryTest/iceberg-compute-stats', vector,
unique_database) common/impala_test_suite.py:742: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:578: in __verify_results_and_errors
replace_filenames_with_placeholder) common/test_result_verifier.py:469:
in verify_raw_results VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:278: in verify_query_result_is_equal
assert expected_results == actual_results E assert Comparing
QueryTestResults (expected vs actual): E 2,1,'2.33KB','NOT CACHED','NOT
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'
!= 2,1,'2.32KB','NOT CACHED','NOT
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'

The problem is the file size which is 2.32KB instead of 2.33KB. This is
because the version is written into the file, and "x.y.z-RELEASE" is one
byte shorter than "x.y.z-SNAPSHOT". The size of the file in this test is
on the boundary between 2.32KB and 2.33KB, so this one byte can change
the value.

This change fixes the problem by using a regex to accept both values so
it works for both snapshot and release versions.

Change-Id: Ia1fa12eebf936ec2f4cc1d5f68ece2c96d1256fb
Reviewed-on: http://gerrit.cloudera.org:8080/19260
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestIcebergTable.test_compute_stats fails in RELEASE builds
> ---
>
> Key: IMPALA-11734
> URL: https://issues.apache.org/jira/browse/IMPALA-11734
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> If the Impala version is set to a release build as described in point 8 in 
> the "How to Release" document 
> ([https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate),]
>  TestIcebergTable.test_compute_stats fails:
> h3. Stacktrace
> {code:java}
> query_test/test_iceberg.py:852: in test_compute_stats 
> self.run_test_case('QueryTest/iceberg-compute-stats', vector, 
> unique_database) common/impala_test_suite.py:742: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:578: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal assert 
> expected_results == actual_results E assert Comparing QueryTestResults 
> (expected vs actual): E 2,1,'2.33KB','NOT CACHED','NOT 
> CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'
>  != 2,1,'2.32KB','NOT CACHED','NOT 
> CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/test_compute_stats_74dbc105.db/ice_alltypes'{code}
> The problem is the file size which is 2.32KB instead of 2.33KB. This is 
> because the version is written into the file, and "x.y.z-RELEASE" is one byte 
> shorter than "x.y.z-SNAPSHOT". The size of the file in this test is on the 
> boundary between 2.32KB and 2.33KB, so this one byte can change the value.
> We could use a row_regex to accept both values so it works for both snapshot 
> and release versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9826) Pass LIBHDFS_OPTS through to the impalads for the docker configuration

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9826:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Pass LIBHDFS_OPTS through to the impalads for the docker configuration
> --
>
> Key: IMPALA-9826
> URL: https://issues.apache.org/jira/browse/IMPALA-9826
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> In a recent run of ubuntu-16.04-dockerised-tests, an Impalad crashed and I 
> noticed that it sent its hs_err_pid*.log file to /tmp:
>  
> {noformat}
> # An error report file with more information is saved as:
> # /tmp/hs_err_pid1.log{noformat}
> Ordinarily, this is controlled by setting XX:ErrorFile in LIBHDFS_OPTS:
> {noformat}
> export LIBHDFS_OPTS="${LIBHDFS_OPTS:-} 
> -Djava.library.path=${HADOOP_LIB_DIR}/native/"
> LIBHDFS_OPTS+=" -XX:ErrorFile=${IMPALA_LOGS_DIR}/hs_err_pid%p.log"{noformat}
> It looks like LIBHDFS_OPTS is not getting passed through to the Impalad when 
> running in docker. We should pass that through, as it will put 
> hs_err_pid*.log files in a location that is archived by the Jenkins job.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9147) The E2E tests in min_max_filters.test do not exercise the code paths in min-max-filter.cc

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9147:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> The E2E tests in min_max_filters.test do not exercise the code paths in 
> min-max-filter.cc 
> --
>
> Key: IMPALA-9147
> URL: https://issues.apache.org/jira/browse/IMPALA-9147
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The E2E tests in min_max_filters.test do not exercise the code paths in 
> {{min-max-filter.cc}}. This could be verified by commenting out the return 
> statement at 
> [https://github.com/apache/impala/blob/master/be/src/util/min-max-filter.cc#L684],
>  and then run the corresponding E2E tests defined in {{min_max_filters.test}}.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test 
> tests/query_test/test_runtime_filters.py::TestMinMaxFilters::test_min_max_filters
> {code}
> After commenting out that return statement in {{min-max-filter.cc}}, we 
> expect to hit a DCHECK at 
> https://github.com/apache/impala/blob/master/be/src/util/min-max-filter.cc#L698.
>  But according to my observation we are not able to hit that DCHECK, implying 
> that the test query at 
> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test#L48-L50
>  does not trigger the code paths within 
> https://github.com/apache/impala/blob/master/be/src/util/min-max-filter.cc#L655-L699.
> We hence need to improve these E2E tests so that the respective code paths 
> are exercised.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10941) Revert back the change for thrift_sasl 0.4.3

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10941:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Revert back the change for thrift_sasl 0.4.3
> 
>
> Key: IMPALA-10941
> URL: https://issues.apache.org/jira/browse/IMPALA-10941
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 4.1.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> A recent patch upgraded thrift_sasl from 0.4.2 to 0.4.3. It broke the builds 
> on Centos-7. We fixed the build issue by changing setup.py for thrift_sasl 
> 0.4.3 as work around. A better approach to fix the issue is to create a 
> temporarily virtualenv prior to the invocation of
>  python in shell/make_shell_tarball.sh, then upgrade pip and install up to 
> date setuptools in virtualenv. Remove the virtualenv after invocation of 
> python code.  With this approach, we can revert back the code change made for 
> thrift_sasl 0.4.3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10262) Linux Packaging Support

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10262:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Linux Packaging Support
> ---
>
> Key: IMPALA-10262
> URL: https://issues.apache.org/jira/browse/IMPALA-10262
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Shant Hovsepian
>Assignee: Quanlong Huang
>Priority: Major
>
> Would be nice if we could easily make installation packages from the Impala 
> source code. For example RPM or DEB packages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11559) Check that the expected last_modified_time is the same as what's on the filesystem failed

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11559:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Check that the expected last_modified_time is the same as what's on the 
> filesystem failed
> -
>
> Key: IMPALA-11559
> URL: https://issues.apache.org/jira/browse/IMPALA-11559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: changxin
>Priority: Major
>
> Check that the expected last_modified_time is the same as what's on the 
> filesystem failed. Thow exception "The libary xx.so last modified time 
> 16588349 does not match the expected last modified time 0"
> After run 'refresh functions' throw exception too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9178) Provide more advice about minimum query memory limit

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-9178:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Provide more advice about minimum query memory limit
> 
>
> Key: IMPALA-9178
> URL: https://issues.apache.org/jira/browse/IMPALA-9178
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Priority: Major
>
> It's generally a good idea to not set minimum query memory limit to too low a 
> value - queries get starved and performance suffers, and it's easy to run 
> into the remaining out-of-memory issues where we haven't implemented 
> reservations.
> Our docs aren't really prescriptive enough here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3119) DDL support for bucketed tables

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-3119:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> DDL support for bucketed tables
> ---
>
> Key: IMPALA-3119
> URL: https://issues.apache.org/jira/browse/IMPALA-3119
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Mostafa Mokhtar
>Assignee: Baike Xia
>Priority: Minor
>  Labels: ramp-up
>
> Reference 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8126) Move per-host resource utilization counters to per-host profiles in coordinator

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-8126:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Move per-host resource utilization counters to per-host profiles in 
> coordinator
> ---
>
> Key: IMPALA-8126
> URL: https://issues.apache.org/jira/browse/IMPALA-8126
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: observability, profile, supportability
>
> Once IMPALA-7694 gets in we should move the per-host resource utilization 
> counters in {{Coordinator::ComputeQuerySummary()}} to the host profiles in 
> {{Coordinator::BackendState}}.
> [~tarmstrong] pointed out 
> [here|https://gerrit.cloudera.org/#/c/12069/13/be/src/runtime/coordinator.cc@789]:
> {quote}
> We could also simplify BackendState::ComputeResourceUtilization() to just use 
> the per-backend counters instead of iterating over fragments.
> I think there may be some compatibility concerns about removing these - 
> existence of the counters isn't contractual but we don't want to break useful 
> tools if avoidable.
> For example, I confirmed that Cloudera Manager actually does parse the 
> existing strings (which is a little sad, but understandable given the lack of 
> other counters).
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7862) Conversion of timestamps after validation can move them out of range in Parquet scans

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-7862:
--
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Conversion of timestamps after validation can move them out of range in 
> Parquet scans
> -
>
> Key: IMPALA-7862
> URL: https://issues.apache.org/jira/browse/IMPALA-7862
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: parquet
>
> On https://gerrit.cloudera.org/#/c/8319/ Csaba observed that the sequencing 
> of conversion and validation could result in invalid timestamps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10690) Unreachable code in SSL certificate check logic

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10690:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Unreachable code in SSL certificate check logic
> ---
>
> Key: IMPALA-10690
> URL: https://issues.apache.org/jira/browse/IMPALA-10690
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Fifteen
>Assignee: Fifteen
>Priority: Trivial
>
> Hi, 
> There is an unreachable branch in `TSSLSocketWithWildcardSAN #73`s
> ```
> # Line73: This branch will never be reached
>  raise TTransportException(
>  type=TTransportException.UNKNOWN,
>  message='Could not validate SSL certificate from '
>  'host "%s". Cert=%s' % (self.host, cert)) 
> ```
> Can I fix it ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11654) Comparison units do not match in Statestore::MonitorSubscriberHeartbeat

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11654:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0

>  Comparison units do not match in Statestore::MonitorSubscriberHeartbeat
> 
>
> Key: IMPALA-11654
> URL: https://issues.apache.org/jira/browse/IMPALA-11654
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Xianqing He
>Priority: Minor
>
> Incorrect use of seconds to compare milliseconds when judging whether it is a 
> slow subscriber
> [https://github.com/apache/impala/blob/a1fddf1022b76d5226fe9d77f059f37bdee46c13/be/src/statestore/statestore.cc#L1022-L1023]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11739) Skip pushing down BinaryPredicate with NullLiteral for Iceberg tables

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11739:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0

> Skip pushing down BinaryPredicate with NullLiteral for Iceberg tables
> -
>
> Key: IMPALA-11739
> URL: https://issues.apache.org/jira/browse/IMPALA-11739
> Project: IMPALA
>  Issue Type: Bug
>Reporter: gaoxiaoqing
>Assignee: gaoxiaoqing
>Priority: Major
>  Labels: iceberg
>
> Following query throws an Exception, "ERROR: ClassCastException: 
> org.apache.impala.analysis.NullLiteral cannot be cast to 
> org.apache.impala.analysis.StringLiteral"
> {noformat}
> select * from iceberg_alltypes_part where p_string!=NULL;{noformat}
> When hdfs table filtered not equal null or equal null, it is successful. It 
> should keep same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11632) Exclude log4j-1.2-api in some Ranger artifacts

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11632:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0

> Exclude log4j-1.2-api in some Ranger artifacts
> --
>
> Key: IMPALA-11632
> URL: https://issues.apache.org/jira/browse/IMPALA-11632
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After RANGER-3498, Ranger's ranger-plugins-audit, 
> ranger-plugins-common, and -ranger-raz-hook-abfs- start pulling in 
> log4j-1.2-api, which is currently banned by Impala's frontend. To be able to 
> compile Impala after 
> RANGER-3498, we should excludes log4j-1.2-api when adding those Ranger  
> dependencies mentioned above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11613) Optimize result spooling for the statement that returns at most one row

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11613:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0

> Optimize result spooling for the statement that returns at most one row
> ---
>
> Key: IMPALA-11613
> URL: https://issues.apache.org/jira/browse/IMPALA-11613
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
>
> If result spooling is enabled and the statement that returns at most one row, 
> we can 
> set the min memory reservation by the  row size.
>  
> The strategy is as follows: 
>  # If it contains string or complex data types we set the reservation as 
> max_row_size.
>  # Others compute the row size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11418) Optimize select contant statement min memory reservation

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11418:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0

> Optimize select contant statement min memory reservation
> 
>
> Key: IMPALA-11418
> URL: https://issues.apache.org/jira/browse/IMPALA-11418
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
>
> If SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min 
> reservation in PlanRootSink.
> For the statement 'select 1', the min reservation is 4MB. It's no problem if 
> the cluster has enough mem available within its process limit to execute the 
> query. But if the host mem not available it will throw 'Failed to get minimum 
> memory reservation'.
> Some connection pools use 'select 1' to check whether the connection is 
> available. The check will fail if memory is oversubscribed.
> For this case we can set the min reservation to 0 to reduce failure when the 
> memory is oversubscribed.
>  
> {code:java}
> Query: explain select 1
> ++
> | Explain String                                         |
> ++
> | Max Per-Host Resource Reservation: Memory=4MB Threads=1 |
> | Per-Host Resource Estimates: Memory=10MB               |
> | Codegen disabled by planner                            |
> |                                                        |
> | PLAN-ROOT SINK                                         |
> | |                                                      |
> | 00:UNION                                               |
> |    constant-operands=1                                 |
> |    row-size=1B cardinality=1                           |
> ++ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11564) For Agg/Scan nodes, increase the Cache of regular expressions to speed up

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11564:
---
 Fix Version/s: (was: Impala 4.2.0)
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> For Agg/Scan nodes, increase the Cache of regular expressions to speed up
> -
>
> Key: IMPALA-11564
> URL: https://issues.apache.org/jira/browse/IMPALA-11564
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> For Agg/Scan nodes, increase the Cache of regular expressions to speed up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11563) Optimized /etc/sysconfig/clock to find the time zone

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11563:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Optimized /etc/sysconfig/clock to find the time zone
> 
>
> Key: IMPALA-11563
> URL: https://issues.apache.org/jira/browse/IMPALA-11563
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> For /etc/sysconfig/clock, human error may sometimes lead to abnormal 
> situations such as spaces in the file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11565) Support IF NOT EXISTS in alter table add columns for kudu table

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11565:
---
Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> Support IF NOT EXISTS in alter table add columns for kudu table
> ---
>
> Key: IMPALA-11565
> URL: https://issues.apache.org/jira/browse/IMPALA-11565
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> Impala already supports IF NOT EXISTS in alter table add columns for general 
> hive table in [IMPALA-7832|http://issues.apache.org/jira/browse/IMPALA-7832], 
> but not for kudu table. This patch try to add such semantics for kudu table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11565) Support IF NOT EXISTS in alter table add columns for kudu table

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11565:
---
Fix Version/s: (was: Impala 4.2.0)

> Support IF NOT EXISTS in alter table add columns for kudu table
> ---
>
> Key: IMPALA-11565
> URL: https://issues.apache.org/jira/browse/IMPALA-11565
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> Impala already supports IF NOT EXISTS in alter table add columns for general 
> hive table in [IMPALA-7832|http://issues.apache.org/jira/browse/IMPALA-7832], 
> but not for kudu table. This patch try to add such semantics for kudu table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11563) Optimized /etc/sysconfig/clock to find the time zone

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11563:
---
Fix Version/s: (was: Impala 4.2.0)

> Optimized /etc/sysconfig/clock to find the time zone
> 
>
> Key: IMPALA-11563
> URL: https://issues.apache.org/jira/browse/IMPALA-11563
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Baike Xia
>Assignee: Baike Xia
>Priority: Major
>
> For /etc/sysconfig/clock, human error may sometimes lead to abnormal 
> situations such as spaces in the file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11536) Invalid push down predicates in outer join simplification

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11536:
---
Fix Version/s: (was: Impala 4.2.0)

> Invalid push down predicates in outer join simplification
> -
>
> Key: IMPALA-11536
> URL: https://issues.apache.org/jira/browse/IMPALA-11536
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Attachments: image-2022-08-25-14-47-51-966.png
>
>
> When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true;
> It may invalid push down the predicate  that is not null rejecting in outer 
> join simplification.
> e.g.
> SELECT COALESCE(jointbl.test_id, testtbl.id, dimtbl.id) AS id, 
> test_zip,testtbl.zip
> FROM functional.jointbl
> FULL OUTER JOIN
> functional.testtbl
> ON jointbl.test_id = testtbl.id
> FULL OUTER JOIN
> functional.dimtbl
> ON coalesce(jointbl.test_id, testtbl.id) = dimtbl.id
> WHERE
> `jointbl`.`test_zip` = 94611 and coalesce(`testtbl`.`zip`, 0) = 0;
>  
> !image-2022-08-25-14-47-51-966.png!  
> We can't push down the predicate 'coalesce(testtbl.zip, 0) = 0' to ScanNode 
> since it is not null rejecting
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11536) Invalid push down predicates in outer join simplification

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11536:
---
Target Version: Impala 4.3.0

> Invalid push down predicates in outer join simplification
> -
>
> Key: IMPALA-11536
> URL: https://issues.apache.org/jira/browse/IMPALA-11536
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Attachments: image-2022-08-25-14-47-51-966.png
>
>
> When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true;
> It may invalid push down the predicate  that is not null rejecting in outer 
> join simplification.
> e.g.
> SELECT COALESCE(jointbl.test_id, testtbl.id, dimtbl.id) AS id, 
> test_zip,testtbl.zip
> FROM functional.jointbl
> FULL OUTER JOIN
> functional.testtbl
> ON jointbl.test_id = testtbl.id
> FULL OUTER JOIN
> functional.dimtbl
> ON coalesce(jointbl.test_id, testtbl.id) = dimtbl.id
> WHERE
> `jointbl`.`test_zip` = 94611 and coalesce(`testtbl`.`zip`, 0) = 0;
>  
> !image-2022-08-25-14-47-51-966.png!  
> We can't push down the predicate 'coalesce(testtbl.zip, 0) = 0' to ScanNode 
> since it is not null rejecting
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11196) Assertion failure in ClientCacheTest.MemLeak ASAN build

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11196.

Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Assertion failure in ClientCacheTest.MemLeak ASAN build
> ---
>
> Key: IMPALA-11196
> URL: https://issues.apache.org/jira/browse/IMPALA-11196
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.1.0
>
>
> The test ClientCacheTest.MemLeak, introduced in IMPALA-11176, fails in ASAN 
> and TSAN build.
> h3. Error Message
> Value of: mem_after Actual: 22012933906432 Expected: mem_before Which is: 
> 22012768583680
> h3. Stacktrace
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/client-cache-test.cc:112
>  Value of: mem_after Actual: 22012933906432 Expected: mem_before Which is: 
> 22012768583680



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-11196) Assertion failure in ClientCacheTest.MemLeak ASAN build

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker reopened IMPALA-11196:


> Assertion failure in ClientCacheTest.MemLeak ASAN build
> ---
>
> Key: IMPALA-11196
> URL: https://issues.apache.org/jira/browse/IMPALA-11196
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Blocker
>  Labels: broken-build
>
> The test ClientCacheTest.MemLeak, introduced in IMPALA-11176, fails in ASAN 
> and TSAN build.
> h3. Error Message
> Value of: mem_after Actual: 22012933906432 Expected: mem_before Which is: 
> 22012768583680
> h3. Stacktrace
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/client-cache-test.cc:112
>  Value of: mem_after Actual: 22012933906432 Expected: mem_before Which is: 
> 22012768583680



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-11193) Assertion fails in ClientCacheTest.MemLeak

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker reopened IMPALA-11193:


> Assertion fails in ClientCacheTest.MemLeak
> --
>
> Key: IMPALA-11193
> URL: https://issues.apache.org/jira/browse/IMPALA-11193
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Yida Wu
>Priority: Blocker
>  Labels: broken-build
>
> The test {*}ClientCacheTest.MemLeak{*}, introduced in IMPALA-11176, fails in 
> several internal builds.
> h3. Error Message
> {code:java}
> Expected: (mem_before) > (0), actual: 0 vs 0{code}
> h3. Stacktrace
> {code:java}
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/runtime/client-cache-test.cc:100
> Expected: (mem_before) > (0), actual: 0 vs 0{code}
> Interestingly it is not the main assert that fails but a "precondition", 
> namely EXPECT_GT(mem_before, 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11193) Assertion fails in ClientCacheTest.MemLeak

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11193.

Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Assertion fails in ClientCacheTest.MemLeak
> --
>
> Key: IMPALA-11193
> URL: https://issues.apache.org/jira/browse/IMPALA-11193
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Yida Wu
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.1.0
>
>
> The test {*}ClientCacheTest.MemLeak{*}, introduced in IMPALA-11176, fails in 
> several internal builds.
> h3. Error Message
> {code:java}
> Expected: (mem_before) > (0), actual: 0 vs 0{code}
> h3. Stacktrace
> {code:java}
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/runtime/client-cache-test.cc:100
> Expected: (mem_before) > (0), actual: 0 vs 0{code}
> Interestingly it is not the main assert that fails but a "precondition", 
> namely EXPECT_GT(mem_before, 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-9496) Allow Struct type in SELECT list for Parquet tables

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker reopened IMPALA-9496:
---

> Allow Struct type in SELECT list for Parquet tables
> ---
>
> Key: IMPALA-9496
> URL: https://issues.apache.org/jira/browse/IMPALA-9496
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: complextype
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9496) Allow Struct type in SELECT list for Parquet tables

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-9496.
---
Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Allow Struct type in SELECT list for Parquet tables
> ---
>
> Key: IMPALA-9496
> URL: https://issues.apache.org/jira/browse/IMPALA-9496
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: complextype
> Fix For: Impala 4.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10508) Add metrics for reading from remote scratch paths

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-10508.

Resolution: Fixed

> Add metrics for reading from remote scratch paths
> -
>
> Key: IMPALA-10508
> URL: https://issues.apache.org/jira/browse/IMPALA-10508
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Minor
> Fix For: Impala 4.3.0
>
>
> For reading data from a remote scratch path, the data can be fetched from the 
> local buffer if the file hasn't been uploaded yet, or fetched from remote 
> filesystem.
> The metrics can help to identify how much data is read from the local buffer, 
> how much is from the remote filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-10508) Add metrics for reading from remote scratch paths

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker reopened IMPALA-10508:


> Add metrics for reading from remote scratch paths
> -
>
> Key: IMPALA-10508
> URL: https://issues.apache.org/jira/browse/IMPALA-10508
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Minor
>
> For reading data from a remote scratch path, the data can be fetched from the 
> local buffer if the file hasn't been uploaded yet, or fetched from remote 
> filesystem.
> The metrics can help to identify how much data is read from the local buffer, 
> how much is from the remote filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10508) Add metrics for reading from remote scratch paths

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-10508:
---
Fix Version/s: Impala 4.3.0

> Add metrics for reading from remote scratch paths
> -
>
> Key: IMPALA-10508
> URL: https://issues.apache.org/jira/browse/IMPALA-10508
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Minor
> Fix For: Impala 4.3.0
>
>
> For reading data from a remote scratch path, the data can be fetched from the 
> local buffer if the file hasn't been uploaded yet, or fetched from remote 
> filesystem.
> The metrics can help to identify how much data is read from the local buffer, 
> how much is from the remote filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11400) Kudu scan bottleneck due to sharing a single Kudu client for multiple tablet scans

2022-11-23 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11400:
---
Fix Version/s: Impala 4.3.0
   (was: Impala 4.2.0)

> Kudu scan bottleneck due to sharing a single Kudu client for multiple tablet 
> scans
> --
>
> Key: IMPALA-11400
> URL: https://issues.apache.org/jira/browse/IMPALA-11400
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Sameera Wijerathne
>Priority: Major
>  Labels: performance
> Fix For: Impala 4.3.0
>
> Attachments: 0.JPG, 1.JPG, 2-1.jpeg, 2.JPG, 2.jpeg, 3.JPG, 4.JPG, 
> 5.JPG, Impala_1.png, Impala_2.png, Kudu_1.png, Kudu_2.png, WhatsApp Image 
> 2022-06-07 at 10.39.27 PM.jpeg
>
>
> This issue was observed when impala queries large datasets resides in Kudu. 
> Even single ImpalaD is scanning multiple kudu tablets, it shows a slowness to 
> retrive data eventhough ImpalaD makes parrellel scans. Reason for this is 
> ImpalaD only uses a single Kudu client for multiple scans but 
> KuduScanner::NextBatch runs on a single thread. So it's rpc reactor thread 
> utilizes upto a single core and bottlenecks all parrelel scans. 
> This behaviour makes Impala clusters that scans kudu cannot be vertically 
> scales to the maximum performance/cores of a node.
> Please refer the screenshots from Kudu slack channel for more information.
>  
> !2-1.jpeg|width=717,height=961!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11739) Skip pushing down BinaryPredicate with NullLiteral for Iceberg tables

2022-11-23 Thread gaoxiaoqing (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiaoqing updated IMPALA-11739:
-
Summary: Skip pushing down BinaryPredicate with NullLiteral for Iceberg 
tables  (was: Iceberg partitioning column cannot be filtered not equal NULL or 
equal NULL)

> Skip pushing down BinaryPredicate with NullLiteral for Iceberg tables
> -
>
> Key: IMPALA-11739
> URL: https://issues.apache.org/jira/browse/IMPALA-11739
> Project: IMPALA
>  Issue Type: Bug
>Reporter: gaoxiaoqing
>Assignee: gaoxiaoqing
>Priority: Major
>  Labels: iceberg
> Fix For: Impala 4.2.0
>
>
> Following query throws an Exception, "ERROR: ClassCastException: 
> org.apache.impala.analysis.NullLiteral cannot be cast to 
> org.apache.impala.analysis.StringLiteral"
> {noformat}
> select * from iceberg_alltypes_part where p_string!=NULL;{noformat}
> When hdfs table filtered not equal null or equal null, it is successful. It 
> should keep same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10001) Find good default value for SORT_RUN_BYTES_LIMIT

2022-11-23 Thread Noemi Pap-Takacs (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637625#comment-17637625
 ] 

Noemi Pap-Takacs commented on IMPALA-10001:
---

To be precise, there is a flag 'enforce_sort_run_bytes_limit', which double 
checks whether SORT_RUN_BYTES_LIMIT should really spill.
If the estimated sort data size is lower than the reservation, it predicts that 
everything will fit into the memory and there is no need to spill. In this case 
SORT_RUN_BYTES_LIMIT will be neglected and it will not force early spilling. If 
spilling did happen because the estimate was too low, 
'enforce_sort_run_bytes_limit' is set to true, and from that point 
SORT_RUN_BYTES_LIMIT will regularly enforce spilling.

This means that with a good input size estimate SORT_RUN_BYTES_LIMIT can avoid 
unnecessary spilling when the query can actually fit all data in memory.
However, when enforcing SORT_RUN_BYTES_LIMIT is necessary, the I/O overhead is 
still present. 
I think that Tim's idea in IMPALA-4530 would be the best solution to this 
problem. He proposed "to produce many smaller in-memory runs, sort those with 
quicksort, then do merge-sort" in-memory, instead of forming only one big run 
or spilling many small runs.

> Find good default value for SORT_RUN_BYTES_LIMIT
> 
>
> Key: IMPALA-10001
> URL: https://issues.apache.org/jira/browse/IMPALA-10001
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Reporter: Riza Suminto
>Priority: Minor
>
> IMPALA-6692 add query option SORT_RUN_BYTES_LIMIT to trigger early sort 
> before the query hit memory limit.
> Currently, it is disabled as default. We need to find a good default value 
> for this query option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org