[jira] [Commented] (IMPALA-10785) when union kudu table and hdfs table, union passthrough does not take effect

2021-07-20 Thread pengdou1990 (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384646#comment-17384646
 ] 

pengdou1990 commented on IMPALA-10785:
--

union output tuple layout depence on it's first child node:

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java#L599]

each slot's isNullable_ value and null indicator depence on all childnode's 
corresponding slot:
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java#L644]

in kudu table tuple, the primary key slot‘s isNullable_ = false
in hdfs table tuple, all slot's isNullable_ = true

in kudu table union with hdfs table situation, tuple memory layout may as hdfs 
table’s tuple memory layout (string type length is 12 byte, without padding), 
and isNullable_ of slots in tuple as kudu table’s


neither hdfs table nor kudu table can't pass isChildPassthrough check, so pass 
through does not take effect

> when union kudu table and hdfs table, union passthrough does not take effect
> 
>
> Key: IMPALA-10785
> URL: https://issues.apache.org/jira/browse/IMPALA-10785
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: pengdou1990
>Priority: Major
>
> IMPALA-3586 already supports union passthrough, and brings great performance 
> improvements in union, but there is still some problems when union between 
> hdfs table and kudu table ,several points cause the problem:
>  # in kudu scanner node output TupleDescriptor, string slot is 16B,while in 
> hdfs scanner node output TupleDescriptor, string slot is 12B,cause tuple 
> memory layout mismatch
>  # in kudu scanner node output TupleDescriptor, string slot is 16B, while in 
> Union output TupleDescriptor, string slot is 12B,cause tuple memory layout 
> mismatch
>  # in Kudu Scannode, row key slot is not null, while in hdfs node, not null 
> slot can't get from the metadata, cause tuple memory layout mismatch
> I hive resolved the 1st and 2nd points, how should I do with the 3rd point?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384539#comment-17384539
 ] 

ASF subversion and git services commented on IMPALA-10627:
--

Commit fabe994d1fb011afb88d1f0f5bf078113775c9db in impala's branch 
refs/heads/master from Attila Jeges
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fabe994 ]

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Reviewed-on: http://gerrit.cloudera.org:8080/17654
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384538#comment-17384538
 ] 

ASF subversion and git services commented on IMPALA-10754:
--

Commit 147b4b9e583098f9611fe28fc9ff1f8451f63e4b in impala's branch 
refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=147b4b9 ]

IMPALA-10754: test_overlap_min_max_filters_on_sorted_columns failed during GVO

This patch addresses a failure in ubuntu-16.04 dockerised test. The test
involved is found in overlap_min_max_filters_on_sorted_columns.test as
follows.

  set minmax_filter_fast_code_path=on;
  set MINMAX_FILTER_THRESHOLD=0.0;
  SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS;
  select straight_join count(a.timestamp_col) from
  alltypes_timestamp_col_only a join [SHUFFLE] alltypes_limited b
  where a.timestamp_col = b.timestamp_col and b.tinyint_col = 4;
   RUNTIME_PROFILE
  aggregation(SUM, NumRuntimeFilteredPages)> 57

The patch reduces the threshold from 58 to 50.

Testing:
   Ran the unit test successfully.

Change-Id: Icb4cc7d533139c4a2b46a872234a47d46cb8a17c
Reviewed-on: http://gerrit.cloudera.org:8080/17696
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.1
>
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10815:


 Summary: Ignore events on non-default hive catalogs
 Key: IMPALA-10815
 URL: https://issues.apache.org/jira/browse/IMPALA-10815
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Hive-3 introduces a new object called catalog which is like a namespace for 
database and tables. Currently, Impala does not support hive catalog. However, 
if there are events on such non-default catalogs the events processing applies 
these events on the catalogd if the database and table name matches. Until we 
support custom catalogs in hive we should ignore the events coming from such 
non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10815:


 Summary: Ignore events on non-default hive catalogs
 Key: IMPALA-10815
 URL: https://issues.apache.org/jira/browse/IMPALA-10815
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Hive-3 introduces a new object called catalog which is like a namespace for 
database and tables. Currently, Impala does not support hive catalog. However, 
if there are events on such non-default catalogs the events processing applies 
these events on the catalogd if the database and table name matches. Until we 
support custom catalogs in hive we should ignore the events coming from such 
non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10468) DROP events which are generated while a batch is being processed may add table incorrectly

2021-07-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10468.
--
Fix Version/s: Impala 4.1
   Resolution: Duplicate

> DROP events which are generated while a batch is being processed may add 
> table incorrectly
> --
>
> Key: IMPALA-10468
> URL: https://issues.apache.org/jira/browse/IMPALA-10468
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> One of the problems with CREATE/DROP events is that they may occur while a 
> batch is being processed and hence EventsProcessor may not able aware of that.
> For example, consider the following sequence of statements:
> create table foo (c1 int);
> drop table foo;
> create table foo (c2 int);
> drop table foo;
> These statements will generate CREATE_TABLE, DROP_TABLE,  CREATE_TABLE, 
> DROP_TABLE event sequence. Generally, if all these 4 events are fetched in a 
> batch, then the first CREATE_TABLE and third CREATE_TABLE is ignored because 
> it is followed by the a DROP_TABLE in the sequence and the DROP_TABLE events 
> take no effect since the table doesn't exist in catalogd anymore.
> However, if the events processor fetches these events in 2 batches (3 and 1) 
> then after the first batch of CREATE_TABLE, DROP_TABLE,  CREATE_TABLE is 
> processed, the third event will add the table foo in the catalogd. The 
> subsequent batch's DROP_TABLE will be processed and remove the table, but 
> between the two batches, catalogd will say that a table called foo exists. 
> This can lead to statements getting errored out. Eg. a statement like create 
> table foo (c3 int) after the above statements will error out with a 
> TableAlreadyExists error.
> The problem happens for databases too. So far I have not been able to 
> reproduce this for Partitions but I don't see why it will not happen with 
> Partitions also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10490) truncate table fails with IllegalStateException

2021-07-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10490.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> truncate table fails with IllegalStateException
> ---
>
> Key: IMPALA-10490
> URL: https://issues.apache.org/jira/browse/IMPALA-10490
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> This is a problem for when events processing is turned on. I can reproduce it 
> by following steps.
> 1. start impala without events processing
> 2. create table, load data, compute stats on the table.
> 3. restart impala with events processing turned on
> 4. Run truncate table command.
> I can see the truncate table command fails with following error.
> [localhost:21050] default> truncate t5;
> Query: truncate t5
> ERROR: CatalogException: Failed to truncate table: default.t5.
> Table may be in a partially truncated state.
> CAUSED BY: IllegalStateException: Table parameters must have catalog service 
> identifier before adding it to partition parameters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10785) when union kudu table and hdfs table, union passthrough does not take effect

2021-07-20 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384435#comment-17384435
 ] 

Qifan Chen commented on IMPALA-10785:
-

For 3), SlotDescriptor in FE has the field called isNullable_ 
(https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java#L66).
  

Seems isNullable_ of the union should be set to isNullable_ field of the column 
in the hdfs table when that for the corresponding column in the kudu table is 
false (not nullable). 

> when union kudu table and hdfs table, union passthrough does not take effect
> 
>
> Key: IMPALA-10785
> URL: https://issues.apache.org/jira/browse/IMPALA-10785
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: pengdou1990
>Priority: Major
>
> IMPALA-3586 already supports union passthrough, and brings great performance 
> improvements in union, but there is still some problems when union between 
> hdfs table and kudu table ,several points cause the problem:
>  # in kudu scanner node output TupleDescriptor, string slot is 16B,while in 
> hdfs scanner node output TupleDescriptor, string slot is 12B,cause tuple 
> memory layout mismatch
>  # in kudu scanner node output TupleDescriptor, string slot is 16B, while in 
> Union output TupleDescriptor, string slot is 12B,cause tuple memory layout 
> mismatch
>  # in Kudu Scannode, row key slot is not null, while in hdfs node, not null 
> slot can't get from the metadata, cause tuple memory layout mismatch
> I hive resolved the 1st and 2nd points, how should I do with the 3rd point?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10814) Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build

2021-07-20 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10814:
-
Labels: broken-build  (was: )

> Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build
> 
>
> Key: IMPALA-10814
> URL: https://issues.apache.org/jira/browse/IMPALA-10814
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1
>Reporter: Wenzhe Zhou
>Priority: Major
>  Labels: broken-build
>
> Saw this build failure in asf-master-core-s3 build: 
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]
>  
> Error Message
> DCHECK found in log file: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
> h3. Standard Error
> Log file created at: 2021/07/19 18:41:06 Running on machine:
> [impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0719 18:41:06.730994 4601 decimal-util.h:129] 
> fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 
> 0)
> F0719 18:41:08.161149 4711 decimal-util.h:129] 
> e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 
> 0)
> From timestamp, the issue seems happened in test 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10814) Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build

2021-07-20 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10814:
-
Description: 
Saw this build failure in asf-master-core-s3 build: 

[https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]

 

*Error Message*

DCHECK found in log file: 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
h3. Standard Error

Log file created at: 2021/07/19 18:41:06 Running on machine:

[impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]

Log line format: [IWEF]mmdd hh:mm:ss.uu threadid [file:line|file:///line]] 
msg

F0719 18:41:06.730994 4601 decimal-util.h:129] 
fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 0)

F0719 18:41:08.161149 4711 decimal-util.h:129] 
e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 0)

 

>From timestamp, the issue seems happened in test: 
>query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc

  was:
Saw this build failure in asf-master-core-s3 build: 

[https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]

 

Error Message

DCHECK found in log file: 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
h3. Standard Error

Log file created at: 2021/07/19 18:41:06 Running on machine:

[impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]

Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg

F0719 18:41:06.730994 4601 decimal-util.h:129] 
fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 0)

F0719 18:41:08.161149 4711 decimal-util.h:129] 
e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 0)

>From timestamp, the issue seems happened in test 

query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc


> Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build
> 
>
> Key: IMPALA-10814
> URL: https://issues.apache.org/jira/browse/IMPALA-10814
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1
>Reporter: Wenzhe Zhou
>Priority: Major
>  Labels: broken-build
>
> Saw this build failure in asf-master-core-s3 build: 
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]
>  
> *Error Message*
> DCHECK found in log file: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
> h3. Standard Error
> Log file created at: 2021/07/19 18:41:06 Running on machine:
> [impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid 
> [file:line|file:///line]] msg
> F0719 18:41:06.730994 4601 decimal-util.h:129] 
> fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 
> 0)
> F0719 18:41:08.161149 4711 decimal-util.h:129] 
> e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 
> 0)
>  
> From timestamp, the issue seems happened in test: 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10814) Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build

2021-07-20 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10814:


 Summary: Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray 
for core-s3 build
 Key: IMPALA-10814
 URL: https://issues.apache.org/jira/browse/IMPALA-10814
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.1
Reporter: Wenzhe Zhou


Saw this build failure in asf-master-core-s3 build: 

[https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]

 

Error Message

DCHECK found in log file: 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
h3. Standard Error

Log file created at: 2021/07/19 18:41:06 Running on machine:

[impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]

Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg

F0719 18:41:06.730994 4601 decimal-util.h:129] 
fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 0)

F0719 18:41:08.161149 4711 decimal-util.h:129] 
e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 0)

>From timestamp, the issue seems happened in test 

query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10814) Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build

2021-07-20 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10814:


 Summary: Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray 
for core-s3 build
 Key: IMPALA-10814
 URL: https://issues.apache.org/jira/browse/IMPALA-10814
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.1
Reporter: Wenzhe Zhou


Saw this build failure in asf-master-core-s3 build: 

[https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/61/]

 

Error Message

DCHECK found in log file: 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/ee_tests/impalad.FATAL
h3. Standard Error

Log file created at: 2021/07/19 18:41:06 Running on machine:

[impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com|http://impala-ec2-centos74-m5-4xlarge-ondemand-072f.vpc.cloudera.com/]

Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg

F0719 18:41:06.730994 4601 decimal-util.h:129] 
fb4b98709a88f345:b51bf00b0002] Check failed: fixed_len_size > 0 (-15 vs. 0)

F0719 18:41:08.161149 4711 decimal-util.h:129] 
e5432b6d3730539d:cf6c2d310002] Check failed: fixed_len_size > 0 (-15 vs. 0)

>From timestamp, the issue seems happened in test 

query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_uncompressed_parquet_orc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384392#comment-17384392
 ] 

ASF subversion and git services commented on IMPALA-10502:
--

Commit 565d0bfa1d12df583ab6d2725ac6ecf2644cd50d in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=565d0bf ]

IMPALA-10502: Fetch events in batches (Addendum)

The earlier change for IMPALA-10502 passes in a batch size
of -1 to fetch all the events from a given event id during a
DDL execution. While this works when HMS backing database is
postgres, it doesn't work well when the HMS backend
is a MySQL database due to HIVE-20226. This change works around the hive
bug to fetch the events in batches of 1000 instead of fetching all the events
in one RPC during the DDL execution.

Testing:
1. Added a unit test for the new changes introduced.
2. Ran the previously failing tests on MySQL HMS backend.

Change-Id: I34bb8984aeb91b37439f77722746f638d8774478
Reviewed-on: http://gerrit.cloudera.org:8080/17698
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Zoltan Borok-Nagy 


> delayed 'Invalidated objects in cache' cause 'Table already exists'
> ---
>
> Key: IMPALA-10502
> URL: https://issues.apache.org/jira/browse/IMPALA-10502
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Clients, Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Adriano
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.1
>
>
> In fast paced environment where the interval between the step 1 and 2 is # < 
> 100ms (a simplified pipeline looks like):
> 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
> difference)
> 1- open session to coord A -> DROP TABLE X -> close session
> 2- open session to coord A -> CREATE TABLE X-> close session
> Results: the step -2- can fail with table already exist.
> During the internal investigation was discovered that IMPALA-9913 will 
> regress the issue in almost all scenarios.
> However considering that the investigation are internally ongoing it is nice 
> to have the event tracked also here.
> Once we are sure that IMPALA-9913 fix these events we can close this as 
> duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10761) Provide query option for illegal UTF-8 characters handling

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384390#comment-17384390
 ] 

ASF subversion and git services commented on IMPALA-10761:
--

Commit 4df03a31ec77b54138aba2805ff5e376463c8e23 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4df03a3 ]

IMPALA-2019(Part-2): Provide UTF-8 support in instr() and locate()

Similar to the previous patch, this patch adds UTF-8 support in instr()
and locate() builtin functions so they can have consistent behaviors
with Hive's. These two string functions both have an optional argument
as position:
INSTR(STRING str, STRING substr[, BIGINT position[, BIGINT occurrence]])
LOCATE(STRING substr, STRING str[, INT pos])
Their return values are positions of the matched substring.

In UTF-8 mode (turned on by set UTF8_MODE=true), these positions are
counted by UTF-8 characters instead of bytes.

Error handling:
Malformed UTF-8 characters are counted as one byte per character. This
is consistent with Hive since Hive replaces those bytes to U+FFFD
(REPLACEMENT CHARACTER). E.g. GenericUDFInstr calls Text#toString(),
which performs the replacement. We can provide more behaviors on error
handling like ignoring them or reporting errors. IMPALA-10761 will focus
on this.

Tests:
 - Add BE unit tests and e2e tests
 - Add random tests to make sure malformed UTF-8 characters won't crash
   us.

Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Reviewed-on: http://gerrit.cloudera.org:8080/17580
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Provide query option for illegal UTF-8 characters handling
> --
>
> Key: IMPALA-10761
> URL: https://issues.apache.org/jira/browse/IMPALA-10761
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Major
>
> There are 3 ways to handle illegal UTF-8 characters:
>  * Replacing them with U+FFFD (REPLACEMENT CHARACTER)
>  * Ignoring them (removing them in the string)
>  * Reporting errors
> We should provide a query option for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384391#comment-17384391
 ] 

ASF subversion and git services commented on IMPALA-10502:
--

Commit 565d0bfa1d12df583ab6d2725ac6ecf2644cd50d in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=565d0bf ]

IMPALA-10502: Fetch events in batches (Addendum)

The earlier change for IMPALA-10502 passes in a batch size
of -1 to fetch all the events from a given event id during a
DDL execution. While this works when HMS backing database is
postgres, it doesn't work well when the HMS backend
is a MySQL database due to HIVE-20226. This change works around the hive
bug to fetch the events in batches of 1000 instead of fetching all the events
in one RPC during the DDL execution.

Testing:
1. Added a unit test for the new changes introduced.
2. Ran the previously failing tests on MySQL HMS backend.

Change-Id: I34bb8984aeb91b37439f77722746f638d8774478
Reviewed-on: http://gerrit.cloudera.org:8080/17698
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Zoltan Borok-Nagy 


> delayed 'Invalidated objects in cache' cause 'Table already exists'
> ---
>
> Key: IMPALA-10502
> URL: https://issues.apache.org/jira/browse/IMPALA-10502
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Clients, Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Adriano
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.1
>
>
> In fast paced environment where the interval between the step 1 and 2 is # < 
> 100ms (a simplified pipeline looks like):
> 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
> difference)
> 1- open session to coord A -> DROP TABLE X -> close session
> 2- open session to coord A -> CREATE TABLE X-> close session
> Results: the step -2- can fail with table already exist.
> During the internal investigation was discovered that IMPALA-9913 will 
> regress the issue in almost all scenarios.
> However considering that the investigation are internally ongoing it is nice 
> to have the event tracked also here.
> Once we are sure that IMPALA-9913 fix these events we can close this as 
> duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10799) Analysis slowdown with inline views and thousands of column

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384388#comment-17384388
 ] 

ASF subversion and git services commented on IMPALA-10799:
--

Commit bd9b7459d0ab453fa185ba6728e5c571835ffa3e in impala's branch 
refs/heads/master from xqhe
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bd9b745 ]

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to
1.Use LinkedHashMap to search and check if we have already seen the alias.
2.Do the check of checkComposedFrom() when the log level is TRACE since
the codes have been mature for a while.

Testing:
Performance testing with a query with 1 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to less than 1 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Reviewed-on: http://gerrit.cloudera.org:8080/17688
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Analysis slowdown with inline views and thousands of column
> ---
>
> Key: IMPALA-10799
> URL: https://issues.apache.org/jira/browse/IMPALA-10799
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Fix For: Impala 4.1
>
>
> If there are thousands of columns in the inlineview, it‘s very slow in 
> analysis. For example, this sql will take almost 4s in analysis if the inline 
> view has tens of thousands of column
> {code:java}
> select c1 from (select c1, c2... c10001 from T) T
>Query Compilation: 3s880ms
>- Translate start: 968.000ns (968.000ns)
>- Translate finished: 4.318ms (4.317ms)
>- Metadata of all 1 tables cached: 42.219ms (37.900ms)
>- Analysis finished: 3s776ms (3s734ms)
>- Value transfer graph computed: 3s806ms (30.163ms)
>- Single node plan created: 3s869ms (62.556ms)
>- Runtime filters computed: 3s874ms (5.603ms)
>- Distributed plan created: 3s874ms (128.086us)
>- Planning finished: 3s880ms (5.836ms)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2019) Proper UTF-8 support in string functions

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384389#comment-17384389
 ] 

ASF subversion and git services commented on IMPALA-2019:
-

Commit 4df03a31ec77b54138aba2805ff5e376463c8e23 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4df03a3 ]

IMPALA-2019(Part-2): Provide UTF-8 support in instr() and locate()

Similar to the previous patch, this patch adds UTF-8 support in instr()
and locate() builtin functions so they can have consistent behaviors
with Hive's. These two string functions both have an optional argument
as position:
INSTR(STRING str, STRING substr[, BIGINT position[, BIGINT occurrence]])
LOCATE(STRING substr, STRING str[, INT pos])
Their return values are positions of the matched substring.

In UTF-8 mode (turned on by set UTF8_MODE=true), these positions are
counted by UTF-8 characters instead of bytes.

Error handling:
Malformed UTF-8 characters are counted as one byte per character. This
is consistent with Hive since Hive replaces those bytes to U+FFFD
(REPLACEMENT CHARACTER). E.g. GenericUDFInstr calls Text#toString(),
which performs the replacement. We can provide more behaviors on error
handling like ignoring them or reporting errors. IMPALA-10761 will focus
on this.

Tests:
 - Add BE unit tests and e2e tests
 - Add random tests to make sure malformed UTF-8 characters won't crash
   us.

Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Reviewed-on: http://gerrit.cloudera.org:8080/17580
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Proper UTF-8 support in string functions
> 
>
> Key: IMPALA-2019
> URL: https://issues.apache.org/jira/browse/IMPALA-2019
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.1, Impala 2.2
>Reporter: Andrés Cordero
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: sql-language
>
> As documented here: 
> https://impala.apache.org/docs/build/html/topics/impala_string.html
> Impala does not properly handle non-ASCII UTF-8 characters, and will return 
> results in string functions such as length that are inconsistent with Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10813) Invalidate external table from catalog cache for truncate table HMS api

2021-07-20 Thread Sourabh Goyal (Jira)
Sourabh Goyal created IMPALA-10813:
--

 Summary: Invalidate external table from catalog cache for truncate 
table HMS api
 Key: IMPALA-10813
 URL: https://issues.apache.org/jira/browse/IMPALA-10813
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Sourabh Goyal


In IMPALA-10648, we started invalidating external tables when certain HMS 
endpoints are accessed from catalog Metastore server. We missed doing the same 
for truncate_table api. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10813) Invalidate external table from catalog cache for truncate table HMS api

2021-07-20 Thread Sourabh Goyal (Jira)
Sourabh Goyal created IMPALA-10813:
--

 Summary: Invalidate external table from catalog cache for truncate 
table HMS api
 Key: IMPALA-10813
 URL: https://issues.apache.org/jira/browse/IMPALA-10813
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Sourabh Goyal


In IMPALA-10648, we started invalidating external tables when certain HMS 
endpoints are accessed from catalog Metastore server. We missed doing the same 
for truncate_table api. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384368#comment-17384368
 ] 

Amogh Margoor commented on IMPALA-10811:


DOC Jira for the same: IMPALA-10812

> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
> Attachments: profile+(13).txt
>
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query (with few fields hidden).
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle 
> after
>  # Creating Driver: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150]
>  # Register Query: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]
>  and execute later parts of RPC asynchronously in different thread without 
> blocking the RPC. That way clients can get query handle and poll for it for 
> state and results.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Attachment: profile+(13).txt

> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
> Attachments: profile+(13).txt
>
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query (with few fields hidden).
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle 
> after
>  # Creating Driver: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150]
>  # Register Query: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]
>  and execute later parts of RPC asynchronously in different thread without 
> blocking the RPC. That way clients can get query handle and poll for it for 
> state and results.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Description: 
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query (with few fields hidden).

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150]
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 


> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query (with few fields hidden).
> These RPCs are: 
> 1. Beeswax:
> 

[jira] [Resolved] (IMPALA-5628) Parquet support for additional valid decimal representations

2021-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-5628.
---
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Parquet support for additional valid decimal representations
> 
>
> Key: IMPALA-5628
> URL: https://issues.apache.org/jira/browse/IMPALA-5628
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 4.1
>
>
> This is an umbrella JIRA to implement valid representations of DECIMAL that 
> Impala doesn't currently support.
> See https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5628) Parquet support for additional valid decimal representations

2021-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-5628.
---
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Parquet support for additional valid decimal representations
> 
>
> Key: IMPALA-5628
> URL: https://issues.apache.org/jira/browse/IMPALA-5628
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 4.1
>
>
> This is an umbrella JIRA to implement valid representations of DECIMAL that 
> Impala doesn't currently support.
> See https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10812) [DOCS] RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10812:
---
Description: 
We would need to document the behaviour of IMPALA-10811 as a limitation with 
AWS NLB. Problem description:

 

Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence clients like impala-shell waiting for RPC to return gets stuck 
indefinitely.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 


> [DOCS] RPC to submit query getting stuck for AWS NLB forever.
> -
>
> Key: IMPALA-10812
> URL: https://issues.apache.org/jira/browse/IMPALA-10812
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> We would need to document the behaviour of IMPALA-10811 as a limitation with 
> AWS NLB. Problem description:
>  
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB 

[jira] [Created] (IMPALA-10812) [DOCS] RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)
Amogh Margoor created IMPALA-10812:
--

 Summary: [DOCS] RPC to submit query getting stuck for AWS NLB 
forever.
 Key: IMPALA-10812
 URL: https://issues.apache.org/jira/browse/IMPALA-10812
 Project: IMPALA
  Issue Type: Bug
Reporter: Amogh Margoor


Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10812) [DOCS] RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)
Amogh Margoor created IMPALA-10812:
--

 Summary: [DOCS] RPC to submit query getting stuck for AWS NLB 
forever.
 Key: IMPALA-10812
 URL: https://issues.apache.org/jira/browse/IMPALA-10812
 Project: IMPALA
  Issue Type: Bug
Reporter: Amogh Margoor


Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Description: 
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver: 
https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150
 # Register Query: 
[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions(https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver,
 # Register Query 
([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 


> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> 

[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Description: 
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return as it can do various operations for planning and submission that 
involve executing  Catalog Operations like Rename, Alter Table Recover 
partition  that can take time on tables with many 
partitions(https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92).
 Attached is the profile of one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver,
 # Register Query 
([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return due to expensive Catalog Operations like Rename, Alter Table 
Recover partition on tables with many partitions. Attached is the profile of 
one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver,
 # Register Query 
([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 


> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions(https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92).
>  Attached is the profile of one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP 

[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Description: 
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return due to expensive Catalog Operations like Rename, Alter Table 
Recover partition on tables with many partitions. Attached is the profile of 
one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after
 # Creating Driver,
 # Register Query 
([https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168])

 and execute later parts of RPC asynchronously in different thread without 
blocking the RPC. That way clients can get query handle and poll for it for 
state and results.

 

  was:
Initial RPC to submit a query and fetch the query handle can take quite long 
time to return due to expensive Catalog Operations like Rename, Alter Table 
Recover partition on tables with many partitions. Attached is the profile of 
one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle 
sooner right after the Query Registration () and execute later parts of RPC 
asynchronously so that clients can get query handle and poll for it for results.

 


> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return due to expensive Catalog Operations like Rename, Alter Table 
> Recover partition on tables with many partitions. Attached is the profile of 
> one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle 
> after
>  # Creating Driver,
>  # Register Query 
> 

[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-07-20 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated IMPALA-10811:
---
Summary: RPC to submit query getting stuck for AWS NLB forever.  (was: RPC 
to submit query getting stuck for AWS NLB for ever.)

> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Priority: Major
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return due to expensive Catalog Operations like Rename, Alter Table 
> Recover partition on tables with many partitions. Attached is the profile of 
> one such DDL query.
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle 
> sooner right after the Query Registration () and execute later parts of RPC 
> asynchronously so that clients can get query handle and poll for it for 
> results.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB for ever.

2021-07-20 Thread Amogh Margoor (Jira)
Amogh Margoor created IMPALA-10811:
--

 Summary: RPC to submit query getting stuck for AWS NLB for ever.
 Key: IMPALA-10811
 URL: https://issues.apache.org/jira/browse/IMPALA-10811
 Project: IMPALA
  Issue Type: Bug
Reporter: Amogh Margoor


Initial RPC to submit a query and fetch the query handle can take quite long 
time to return due to expensive Catalog Operations like Rename, Alter Table 
Recover partition on tables with many partitions. Attached is the profile of 
one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle 
sooner right after the Query Registration () and execute later parts of RPC 
asynchronously so that clients can get query handle and poll for it for results.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB for ever.

2021-07-20 Thread Amogh Margoor (Jira)
Amogh Margoor created IMPALA-10811:
--

 Summary: RPC to submit query getting stuck for AWS NLB for ever.
 Key: IMPALA-10811
 URL: https://issues.apache.org/jira/browse/IMPALA-10811
 Project: IMPALA
  Issue Type: Bug
Reporter: Amogh Margoor


Initial RPC to submit a query and fetch the query handle can take quite long 
time to return due to expensive Catalog Operations like Rename, Alter Table 
Recover partition on tables with many partitions. Attached is the profile of 
one such DDL query.

These RPCs are: 

1. Beeswax:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]

2. HS2:

[https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]

 

One of the side effects of such RPC taking long time is that clients such as 
impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and 
closes connections after 350s and cannot be configured. But after closing the 
connection it doesn;t send TCP RST to the client. Only when client tries to 
send data or packets NLB issues back TCP RST to indicate connection is not 
alive. Documentation is here: 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
 Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle 
sooner right after the Query Registration () and execute later parts of RPC 
asynchronously so that clients can get query handle and poll for it for results.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10810) Bump json-smart from 2.3 to at least 2.4.1

2021-07-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10810:
---
Component/s: Frontend

> Bump json-smart from 2.3 to at least 2.4.1
> --
>
> Key: IMPALA-10810
> URL: https://issues.apache.org/jira/browse/IMPALA-10810
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> I noticed that our json-smart dependency is stale and we could pick up a 
> newer version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10810) Bump json-smart from 2.3 to at least 2.4.1

2021-07-20 Thread Jira
Zoltán Borók-Nagy created IMPALA-10810:
--

 Summary: Bump json-smart from 2.3 to at least 2.4.1
 Key: IMPALA-10810
 URL: https://issues.apache.org/jira/browse/IMPALA-10810
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


I noticed that our json-smart dependency is stale and we could pick up a newer 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10810) Bump json-smart from 2.3 to at least 2.4.1

2021-07-20 Thread Jira
Zoltán Borók-Nagy created IMPALA-10810:
--

 Summary: Bump json-smart from 2.3 to at least 2.4.1
 Key: IMPALA-10810
 URL: https://issues.apache.org/jira/browse/IMPALA-10810
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


I noticed that our json-smart dependency is stale and we could pick up a newer 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10808) Crash of illegal decimal schema in test_fuzz_decimal_tbl

2021-07-20 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10808:

Affects Version/s: Impala 4.1

> Crash of illegal decimal schema in test_fuzz_decimal_tbl
> 
>
> Key: IMPALA-10808
> URL: https://issues.apache.org/jira/browse/IMPALA-10808
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> Recently saw two unrelated jobs failed by the same crash:
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14369]
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14381]
> For example in the second job, the test that crashes impalad is {code}
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[protocol:beeswax|exec_option:{'debug_action':'-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5';'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:parquet/none
> {code}
> The failure is
> {code:java}
> I0720 03:34:53.168516 126039 runtime-state.cc:196] 
> 8a42e69ff49106c8:d2096a71] Error from query 
> 8a42e69ff49106c8:d2096a70: File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_4a8e12be.db/decimal_tbl/d6=1/copy1_6b48619353a75ffb-66460f74_973668612_data.0.parq'
>  column 'd1' does not have the decimal precision set.
> F0720 03:34:53.168567 126039 types.h:282] 8a42e69ff49106c8:d2096a71] 
> Check failed: precision > 0 (0 vs. 0)
> {code}
> CC [~boroknagyz] who owns the first job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10808) Crash of illegal decimal schema in test_fuzz_decimal_tbl

2021-07-20 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10808:
---

Assignee: Quanlong Huang

> Crash of illegal decimal schema in test_fuzz_decimal_tbl
> 
>
> Key: IMPALA-10808
> URL: https://issues.apache.org/jira/browse/IMPALA-10808
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> Recently saw two unrelated jobs failed by the same crash:
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14369]
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14381]
> For example in the second job, the test that crashes impalad is {code}
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[protocol:beeswax|exec_option:{'debug_action':'-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5';'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:parquet/none
> {code}
> The failure is
> {code:java}
> I0720 03:34:53.168516 126039 runtime-state.cc:196] 
> 8a42e69ff49106c8:d2096a71] Error from query 
> 8a42e69ff49106c8:d2096a70: File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_4a8e12be.db/decimal_tbl/d6=1/copy1_6b48619353a75ffb-66460f74_973668612_data.0.parq'
>  column 'd1' does not have the decimal precision set.
> F0720 03:34:53.168567 126039 types.h:282] 8a42e69ff49106c8:d2096a71] 
> Check failed: precision > 0 (0 vs. 0)
> {code}
> CC [~boroknagyz] who owns the first job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10809) improve the performance of unnest operation

2021-07-20 Thread pengdou1990 (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengdou1990 updated IMPALA-10809:
-
Description: 
h2. current situation

Impala's support for complex data types is not particularly friendly.

For example, if you need to expand rows containing Array type fields, you need 
to unnest the array fields first, and then do a nested loop join.

If you need to expand multiple array fields, you need to do multiple unnests, 
And perform multiple unest and nested loop joins, which puts a lot of 
computational pressure on the executor.

DDL:
{code:java}
CREATE TABLE rawdata.users2 (                                     
 day INT,                                                        
 sampling_group INT,                                             
 user_id BIGINT,                                                 
  time TIMESTAMP,                                                 
 _offset BIGINT,                                                 
 event_id INT,                                                   
 month_id INT,                                                   
 week_id INT,                                                    
 distinct_id STRING,                                             
 event_bucket INT,                                               
 adresses_list_string ARRAY,                             
 count_list_bigint ARRAY                                 
 )                                                                 
 WITH SERDEPROPERTIES ('serialization.format'='1')                 
 STORED AS PARQUET                                                 
 LOCATION 'hdfs://localhost:20500/test-warehouse/rawdata.db/users2'{code}

 Query SQL:
{code:java}
SELECT
    `day`,
    list`.item,
   list1.item 
 FROM
   rawdata.users2,
   rawdata.users2.adresses_list_string list1,
   rawdata.users2.count_list_bigint list2{code}

 Simplified Plan:

 
{code:java}
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
07:EXCHANGE [UNPARTITIONED]
|
01:SUBPLAN
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
| |
| |--04:UNNEST [users2.count_list_bigint clist]
| |
| 05:NESTED LOOP JOIN [CROSS JOIN]
| |
| |--02:SINGULAR ROW SRC
| |
| 03:UNNEST [users2.adresses_list_string list]
|
00:SCAN HDFS [rawdata.users2, RANDOM]

 {code}
h2. Improve Solution

In actual use, I found that if some changes are made to the calculation logic 
of unnest, the calculation performance will be greatly improved:

At first, in FE construct and new plan type, named explode node, it and it's 
child node construct a pipeline operation

then, in BE, the raw was explode locally, and the fileds layout as childnode

the query sql and Plan greatly simplified:

Query SQL:
{code:java}
SELECT
    `day`,
   explode(adresses_list_string),
   explode(count_list_bigint) 
 from
   rawdata.users2{code}

 the simplified Plan as this:
{code:java}
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
02:EXCHANGE [UNPARTITIONED]
|
01:EXPLODE NODE [UNPARTITIONED]
|
00:SCAN HDFS [rawdata.users2, RANDOM] {code}
 

  was:
h2. current situation

Impala's support for complex data types is not particularly friendly.

For example, if you need to expand rows containing Array type fields, you need 
to unnest the array fields first, and then do a nested loop join.

If you need to expand multiple array fields, you need to do multiple unnests, 
And perform multiple unest and nested loop joins, which puts a lot of 
computational pressure on the executor. 

DDL:
CREATE TABLE rawdata.users2 (                                     
  day INT,                                                        
  sampling_group INT,                                             
  user_id BIGINT,                                                 
  time TIMESTAMP,                                                 
  _offset BIGINT,                                                 
  event_id INT,                                                   
  month_id INT,                                                   
  week_id INT,                                                    
  distinct_id STRING,                                             
  event_bucket INT,                                               
  adresses_list_string ARRAY,                             
  count_list_bigint ARRAY                                 
)                                                                 
WITH SERDEPROPERTIES ('serialization.format'='1')                 
STORED AS PARQUET                                                 
LOCATION 'hdfs://localhost:20500/test-warehouse/rawdata.db/users2'
Query SQL:
SELECT
    `day`,
    list`.item,
    list1.item 
FROM
    rawdata.users2,
    rawdata.users2.adresses_list_string list1,
    rawdata.users2.count_list_bigint list2
Simplified Plan:
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
07:EXCHANGE [UNPARTITIONED]
|
01:SUBPLAN
|

[jira] [Created] (IMPALA-10809) improve the performance of unnest operation

2021-07-20 Thread pengdou1990 (Jira)
pengdou1990 created IMPALA-10809:


 Summary: improve the performance of unnest operation
 Key: IMPALA-10809
 URL: https://issues.apache.org/jira/browse/IMPALA-10809
 Project: IMPALA
  Issue Type: Improvement
Reporter: pengdou1990


h2. current situation

Impala's support for complex data types is not particularly friendly.

For example, if you need to expand rows containing Array type fields, you need 
to unnest the array fields first, and then do a nested loop join.

If you need to expand multiple array fields, you need to do multiple unnests, 
And perform multiple unest and nested loop joins, which puts a lot of 
computational pressure on the executor. 

DDL:
CREATE TABLE rawdata.users2 (                                     
  day INT,                                                        
  sampling_group INT,                                             
  user_id BIGINT,                                                 
  time TIMESTAMP,                                                 
  _offset BIGINT,                                                 
  event_id INT,                                                   
  month_id INT,                                                   
  week_id INT,                                                    
  distinct_id STRING,                                             
  event_bucket INT,                                               
  adresses_list_string ARRAY,                             
  count_list_bigint ARRAY                                 
)                                                                 
WITH SERDEPROPERTIES ('serialization.format'='1')                 
STORED AS PARQUET                                                 
LOCATION 'hdfs://localhost:20500/test-warehouse/rawdata.db/users2'
Query SQL:
SELECT
    `day`,
    list`.item,
    list1.item 
FROM
    rawdata.users2,
    rawdata.users2.adresses_list_string list1,
    rawdata.users2.count_list_bigint list2
Simplified Plan:
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
07:EXCHANGE [UNPARTITIONED]
|
01:SUBPLAN
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
|  |
|  |--04:UNNEST [users2.count_list_bigint clist]
|  |
|  05:NESTED LOOP JOIN [CROSS JOIN]
|  |
|  |--02:SINGULAR ROW SRC
|  |
|  03:UNNEST [users2.adresses_list_string list]
|
00:SCAN HDFS [rawdata.users2, RANDOM]
h2. Improve Solution

In actual use, I found that if some changes are made to the calculation logic 
of unnest, the calculation performance will be greatly improved:

At first, in FE construct and new plan type, named explode node, it and it's 
child node construct a pipeline operation

then, in BE, the raw was explode locally, and the fileds layout as childnode

the query sql and Plan greatly simplified:

Query SQL:
SELECT
    `day`,
    explode(adresses_list_string),
    explode(count_list_bigint) 
from
    rawdata.users2
the simplified Plan as this:
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
02:EXCHANGE [UNPARTITIONED]
|
01:EXPLODE NODE [UNPARTITIONED] 
|
00:SCAN HDFS [rawdata.users2, RANDOM]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10809) improve the performance of unnest operation

2021-07-20 Thread pengdou1990 (Jira)
pengdou1990 created IMPALA-10809:


 Summary: improve the performance of unnest operation
 Key: IMPALA-10809
 URL: https://issues.apache.org/jira/browse/IMPALA-10809
 Project: IMPALA
  Issue Type: Improvement
Reporter: pengdou1990


h2. current situation

Impala's support for complex data types is not particularly friendly.

For example, if you need to expand rows containing Array type fields, you need 
to unnest the array fields first, and then do a nested loop join.

If you need to expand multiple array fields, you need to do multiple unnests, 
And perform multiple unest and nested loop joins, which puts a lot of 
computational pressure on the executor. 

DDL:
CREATE TABLE rawdata.users2 (                                     
  day INT,                                                        
  sampling_group INT,                                             
  user_id BIGINT,                                                 
  time TIMESTAMP,                                                 
  _offset BIGINT,                                                 
  event_id INT,                                                   
  month_id INT,                                                   
  week_id INT,                                                    
  distinct_id STRING,                                             
  event_bucket INT,                                               
  adresses_list_string ARRAY,                             
  count_list_bigint ARRAY                                 
)                                                                 
WITH SERDEPROPERTIES ('serialization.format'='1')                 
STORED AS PARQUET                                                 
LOCATION 'hdfs://localhost:20500/test-warehouse/rawdata.db/users2'
Query SQL:
SELECT
    `day`,
    list`.item,
    list1.item 
FROM
    rawdata.users2,
    rawdata.users2.adresses_list_string list1,
    rawdata.users2.count_list_bigint list2
Simplified Plan:
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
07:EXCHANGE [UNPARTITIONED]
|
01:SUBPLAN
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
|  |
|  |--04:UNNEST [users2.count_list_bigint clist]
|  |
|  05:NESTED LOOP JOIN [CROSS JOIN]
|  |
|  |--02:SINGULAR ROW SRC
|  |
|  03:UNNEST [users2.adresses_list_string list]
|
00:SCAN HDFS [rawdata.users2, RANDOM]
h2. Improve Solution

In actual use, I found that if some changes are made to the calculation logic 
of unnest, the calculation performance will be greatly improved:

At first, in FE construct and new plan type, named explode node, it and it's 
child node construct a pipeline operation

then, in BE, the raw was explode locally, and the fileds layout as childnode

the query sql and Plan greatly simplified:

Query SQL:
SELECT
    `day`,
    explode(adresses_list_string),
    explode(count_list_bigint) 
from
    rawdata.users2
the simplified Plan as this:
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
02:EXCHANGE [UNPARTITIONED]
|
01:EXPLODE NODE [UNPARTITIONED] 
|
00:SCAN HDFS [rawdata.users2, RANDOM]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9659) Document supported distros for Impala 4.0

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383843#comment-17383843
 ] 

ASF subversion and git services commented on IMPALA-9659:
-

Commit 602eec3b6e712c54cb2e78f534991aced74b7d33 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=602eec3 ]

IMPALA-9659: [DOCS] Document supported distros

Our Requirements docuemnt points to the README.md about supported
distros:
https://impala.apache.org/docs/build/html/topics/impala_prereqs.html

However, README.md doesn't mention is. This patch adds a section for
this.

Change-Id: I7104c24112d3ee298a9c9edd07e267b39bc77fa6
Reviewed-on: http://gerrit.cloudera.org:8080/17583
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Document supported distros for Impala 4.0
> -
>
> Key: IMPALA-9659
> URL: https://issues.apache.org/jira/browse/IMPALA-9659
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Quanlong Huang
>Priority: Blocker
>
> We don't appear to document which distributions Impala is actually supported 
> on. We should clarify this going forward in Impala 4.0. We already sent out a 
> mail to the user list with a proposal: 
> https://mail-archives.apache.org/mod_mbox/impala-user/202004.mbox/browser
> I think de-facto it is Ubuntu 16.04 and 18.04, CentOS/RHEL7 and soon 8 (and 
> compatible variants) and maybe SLES12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10808) Crash of illegal decimal schema in test_fuzz_decimal_tbl

2021-07-20 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10808:
---

 Summary: Crash of illegal decimal schema in test_fuzz_decimal_tbl
 Key: IMPALA-10808
 URL: https://issues.apache.org/jira/browse/IMPALA-10808
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


Recently saw two unrelated jobs failed by the same crash:
 * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14369]
 * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14381]

For example in the second job, the test that crashes impalad is {code}
query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[protocol:beeswax|exec_option:{'debug_action':'-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5';'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:parquet/none
{code}

The failure is
{code:java}
I0720 03:34:53.168516 126039 runtime-state.cc:196] 
8a42e69ff49106c8:d2096a71] Error from query 
8a42e69ff49106c8:d2096a70: File 
'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_4a8e12be.db/decimal_tbl/d6=1/copy1_6b48619353a75ffb-66460f74_973668612_data.0.parq'
 column 'd1' does not have the decimal precision set.
F0720 03:34:53.168567 126039 types.h:282] 8a42e69ff49106c8:d2096a71] 
Check failed: precision > 0 (0 vs. 0)
{code}

CC [~boroknagyz] who owns the first job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10808) Crash of illegal decimal schema in test_fuzz_decimal_tbl

2021-07-20 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10808:
---

 Summary: Crash of illegal decimal schema in test_fuzz_decimal_tbl
 Key: IMPALA-10808
 URL: https://issues.apache.org/jira/browse/IMPALA-10808
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


Recently saw two unrelated jobs failed by the same crash:
 * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14369]
 * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14381]

For example in the second job, the test that crashes impalad is {code}
query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[protocol:beeswax|exec_option:{'debug_action':'-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5';'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:parquet/none
{code}

The failure is
{code:java}
I0720 03:34:53.168516 126039 runtime-state.cc:196] 
8a42e69ff49106c8:d2096a71] Error from query 
8a42e69ff49106c8:d2096a70: File 
'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_4a8e12be.db/decimal_tbl/d6=1/copy1_6b48619353a75ffb-66460f74_973668612_data.0.parq'
 column 'd1' does not have the decimal precision set.
F0720 03:34:53.168567 126039 types.h:282] 8a42e69ff49106c8:d2096a71] 
Check failed: precision > 0 (0 vs. 0)
{code}

CC [~boroknagyz] who owns the first job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)