[jira] [Created] (IMPALA-12312) Support using non-default executor groups for planner

2023-07-24 Thread YifanZhang (Jira)
YifanZhang created IMPALA-12312:
---

 Summary: Support using non-default executor groups for planner
 Key: IMPALA-12312
 URL: https://issues.apache.org/jira/browse/IMPALA-12312
 Project: IMPALA
  Issue Type: Improvement
Reporter: YifanZhang


An Impala cluster can contain multiple executor group sets, including a default 
executor group and non-default executor groups. The current planner always 
tends to pick the default executor group for scheduling a query because only 
the default executor group is included in the update message that is sent to 
the frontend.

When we set a non-default request_pool for a query, the generated plan is still 
based on the default executor group, which leads to an inaccurate execution 
plan.

This can be improved by sending all configured executor groups' information to 
the frontend.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Description: 
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines(\[', 
'.join(actual_types)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines(\[', 
'.join(actual_labels)\])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.

  was:
We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # 

[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Labels: test-infra  (was: )

> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
> following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines([', 
> '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = join_section_lines([', 
> '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
>  # [test_section[result_section] = 
> join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].
> Thus, we also have the same issue for subsections like TYPES, LABELS, and 
> RESULTS in such a scenario (actual results do not match expected ones). It 
> would be good if a user/developer does not have to manually remove those 
> extra newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12311:
-
Affects Version/s: Impala 4.1.2

> Extra newlines are produced when an end-to-end test is run with 
> update_results 
> ---
>
> Key: IMPALA-12311
> URL: https://issues.apache.org/jira/browse/IMPALA-12311
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: test-infra
>
> We found that extra newlines are produced in the updated golden file when the 
> actual results do not match the expected results specified in the original 
> golden file.
> Take 
> [TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
>  for example, this test runs the test cases in 
> [decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].
> Suppose that we modify the expected error message at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
>  from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
> following (the original string with an additional "x").
> {noformat}
> UDF WARNING: Decimal expression overflowed, returning NULLx
> {noformat}
> Then we run this test using the following command with the command line 
> argument '--update_results'.
> {code:java}
> $IMPALA_HOME/bin/impala-py.test \
> --update_results \
> --junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
> $IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
> {code}
> In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
> following subsection corresponding to the query. There are 3 additional 
> newlines in the subsection of 'ERRORS'.
> {noformat}
>  ERRORS
> UDF WARNING: Decimal expression overflowed, returning NULL
> 
> {noformat}
> One of the newlines was produced in 
> [join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
>  This function is called when the actual results do not match the expected 
> results in the following 4 places.
>  # [test_section['ERRORS'] = 
> join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
>  # [test_section['TYPES'] = join_section_lines([', 
> '.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
>  # [test_section['LABELS'] = join_section_lines([', 
> '.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
>  # [test_section[result_section] = 
> join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].
> Thus, we also have the same issue for subsections like TYPES, LABELS, and 
> RESULTS in such a scenario (actual results do not match expected ones). It 
> would be good if a user/developer does not have to manually remove those 
> extra newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12311) Extra newlines are produced when an end-to-end test is run with update_results

2023-07-24 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12311:


 Summary: Extra newlines are produced when an end-to-end test is 
run with update_results 
 Key: IMPALA-12311
 URL: https://issues.apache.org/jira/browse/IMPALA-12311
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that extra newlines are produced in the updated golden file when the 
actual results do not match the expected results specified in the original 
golden file.

Take 
[TestDecimalExprs::test_exprs()|https://github.com/apache/impala/blob/master/tests/query_test/test_decimal_queries.py#L75]
 for example, this test runs the test cases in 
[decimal-exprs.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test].

Suppose that we modify the expected error message at 
[https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test#L107]
 from "UDF WARNING: Decimal expression overflowed, returning NULL" to the 
following (the original string with an additional "x").
{noformat}
UDF WARNING: Decimal expression overflowed, returning NULLx
{noformat}
Then we run this test using the following command with the command line 
argument '--update_results'.
{code:java}
$IMPALA_HOME/bin/impala-py.test \
--update_results \
--junitxml=$IMPALA_EE_TEST_LOGS_DIR/results/test_decimal.xml \
$IMPALA_HOME/tests/query_test/test_decimal_queries.py::TestDecimalExprs::test_exprs
{code}
In $IMPALA_HOME/logs/ee_tests/QueryTest_decimal-exprs.test, we will found the 
following subsection corresponding to the query. There are 3 additional 
newlines in the subsection of 'ERRORS'.
{noformat}
 ERRORS
UDF WARNING: Decimal expression overflowed, returning NULL




{noformat}
One of the newlines was produced in 
[join_section_lines()|https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py#L298].
 This function is called when the actual results do not match the expected 
results in the following 4 places.
 # [test_section['ERRORS'] = 
join_section_lines(actual_errors)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L398].
 # [test_section['TYPES'] = join_section_lines([', 
'.join(actual_types)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L429].
 # [test_section['LABELS'] = join_section_lines([', 
'.join(actual_labels)])|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L451].
 # [test_section[result_section] = 
join_section_lines(actual.result_list)|https://github.com/apache/impala/blob/master/tests/common/test_result_verifier.py#L489].

Thus, we also have the same issue for subsections like TYPES, LABELS, and 
RESULTS in such a scenario (actual results do not match expected ones). It 
would be good if a user/developer does not have to manually remove those extra 
newlines when trying to generate the golden files for new test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12307) test_75_percent_availability fails on object stores

2023-07-24 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746663#comment-17746663
 ] 

Riza Suminto commented on IMPALA-12307:
---

Fixing the issue with this patch:
 
[IMPALA-12300|http://issues.apache.org/jira/browse/IMPALA-12300]: (addendum) 
Remove HDFS specific assertion
[https://gerrit.cloudera.org/c/20259/] 

> test_75_percent_availability fails on object stores
> ---
>
> Key: IMPALA-12307
> URL: https://issues.apache.org/jira/browse/IMPALA-12307
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
>
> test_75_percent_availability fails on Ozone and S3.
> The test expects the string "SCAN HDFS" to be found in the profile.
> Instead of it there's "SCAN OZONE" and "SCAN S3".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12307) test_75_percent_availability fails on object stores

2023-07-24 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-12307.
---
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> test_75_percent_availability fails on object stores
> ---
>
> Key: IMPALA-12307
> URL: https://issues.apache.org/jira/browse/IMPALA-12307
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.3.0
>
>
> test_75_percent_availability fails on Ozone and S3.
> The test expects the string "SCAN HDFS" to be found in the profile.
> Instead of it there's "SCAN OZONE" and "SCAN S3".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5081) Expose IR optimization level via query option

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-5081:
-

Assignee: Michael Smith

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6288) Impala C++ test libraries are surprisingly large

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith closed IMPALA-6288.
-
Resolution: Not A Problem

> Impala C++ test libraries are surprisingly large
> 
>
> Key: IMPALA-6288
> URL: https://issues.apache.org/jira/browse/IMPALA-6288
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Philip Martin
>Priority: Major
>
> The C++ tests are 60MB or so each.
> {code}
> $ls -l build/debug/rpc/*-test
> -rwxrwxr-x 1 philip philip 60492088 Nov  6 17:11 
> build/debug/rpc/authentication-test*
> -rwxrwxr-x 1 philip philip 61593808 Nov  6 17:11 build/debug/rpc/rpc-mgr-test*
> -rwxrwxr-x 1 philip philip 63047936 Nov  6 17:11 
> build/debug/rpc/thrift-server-test*
> -rwxrwxr-x 1 philip philip 60489200 Nov  6 17:11 
> build/debug/rpc/thrift-util-test*
> {code}
> I don't have a super clear picture of what's going on, but I think we might 
> be statically linking against LLVM even when we try to link dynamically. 
> Using {{nm}} to look at symbols, I can see that about 24MB (of 62MB) is used 
> by llvm-looking things. 
> {code}
> $ nm --demangle --print-size --size-sort --radix=d util/lru-cache-test  | 
> sort -k 2 -n | grep llvm:: | awk '{ x += $2 } END { print x }'
> 23691589
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2651) codegen overhead can be high

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-2651:
-

Assignee: Michael Smith

> codegen overhead can be high
> 
>
> Key: IMPALA-2651
> URL: https://issues.apache.org/jira/browse/IMPALA-2651
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Silvius Rus
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen, performance
>
> We received reports of excessive codegen compilation/optimization times for 
> very large expressions generated by visualization tools.
> We should:
> # Expose codegen optimization levels as query options.  Currently there is 
> only an all or nothing codegen query option.  It's likely that overly complex 
> expressions such as hundreds of cascading conditions take very long and 
> benefit very little from an O2 optimization level, but they could still run 
> significantly faster even at O0 or O1 versus interpreted.
> # Consider dropping to O1 (or turn off riskier passes individually) 
> automatically for very large expressions.
> # Consider parameterizing the compilation duration time limits and set a 
> reasonable default, say 10 seconds.  Either disable codegen or reduce it to, 
> say, O0 if compilation takes longer than the preset limit.
> *Workaround*
> In some cases disabling codegen can help.
> {code}
> SET disable_codegen=true;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12119) compilation fails on arm64

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12119.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> compilation fails on arm64
> --
>
> Key: IMPALA-12119
> URL: https://issues.apache.org/jira/browse/IMPALA-12119
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sebastian Pop
>Assignee: Sebastian Pop
>Priority: Minor
> Fix For: Impala 4.3.0
>
>
> Compiling impala on arm64 fails with an error of redefined FORCE_INLINE 
> preprocessor macro:
> {{/home/spop/impala/be/src/util/sse2neon.h:48: error: "FORCE_INLINE" 
> redefined [-Werror]
>    48 | #define FORCE_INLINE static inline _{_}attribute{_}_((always_inline))
>       | 
> In file included from 
> /home/spop/impala/be/src/codegen/llvm-codegen-cache.h:30,
>                  from /home/spop/impala/be/src/runtime/exec-env.cc:27:
> /home/spop/impala/be/src/thirdparty/datasketches/MurmurHash3.h:45: note: this 
> is the location of the previous definition
>    45 | #define FORCE_INLINE inline _{_}attribute{_}_((always_inline))
>       | 
> }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3612) Investigate ways to reduce optimization overhead of LLVM

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-3612:
-

Assignee: Michael Smith

> Investigate ways to reduce optimization overhead of LLVM
> 
>
> Key: IMPALA-3612
> URL: https://issues.apache.org/jira/browse/IMPALA-3612
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
> Attachments: LLVM CallStack.csv, TPC-DS Q64 profile.txt
>
>
> Short running queries in 2.6 regressed by 100ms <-> 500ms due to increase in 
> codegen time, this is mainly due to new code generation for hash table build 
> and prefetching. 
> In 2.6 
> {code}
>  CodeGen:(Total: 1s588ms, non-child: 1s588ms, % non-child: 100.00%)
>  - CodegenTime: 2.897ms
>  - CompileTime: 403.961ms
>  - LoadTime: 0.000ns
>  - ModuleBitcodeSize: 2.24 MB (2353348)
>  - NumFunctions: 272 (272)
>  - NumInstructions: 8.21K (8213)
>  - OptimizationTime: 971.162ms
>  - PrepareTime: 212.672ms
> {code}
> In 2.5
> {code}
>   CodeGen:(Total: 1s028ms, non-child: 1s028ms, % non-child: 100.00%)
>  - CodegenTime: 3.608ms
>  - CompileTime: 246.771ms
>  - LoadTime: 0.000ns
>  - ModuleBitcodeSize: 1.85 MB (1939948)
>  - OptimizationTime: 605.070ms
>  - PrepareTime: 174.832ms
> {code}
> Attached is call stack for LLVM 
> And this is the query I used
> {code}
> select 
> count(*)
> from
> (select distinct
> iss.i_brand_id brand_id,
> iss.i_class_id class_id,
> iss.i_category_id category_id
> from
> (select 
> *,
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) * 33 * 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) * 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) * 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) * 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) + 10 -1 * 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) * ss_coupon_amt
> * cast ("10" as double) 
> from
> store_sales 
> limit 1) store_sales, (select 
> *
> from
> item
> limit 1) iss, (select 
> *
> from
> date_dim
> limit 1) d1
> where
> ss_item_sk = iss.i_item_sk
> and ss_sold_date_sk = d1.d_date_sk
> and d1.d_year between 1999 AND 1999 + 2 and 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) + (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) = 19 and 
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk)  < 0 and
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) > 10 and
> (ss_wholesale_cost * ss_sales_price * ss_ext_discount_amt * 
> ss_ext_wholesale_cost * ss_item_sk * ss_cdemo_sk + 1 - 10 + ss_store_sk + 
> ss_store_sk) = 100 and 
> ss_customer_sk + 10 - 1 + ss_hdemo_sk = 100 and cast (ss_quantity as string) 
> = "vv") a;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11917) Upgrade to LLVM 7.0.0 or higher

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-11917:
--

Assignee: Michael Smith

> Upgrade to LLVM 7.0.0 or higher
> ---
>
> Key: IMPALA-11917
> URL: https://issues.apache.org/jira/browse/IMPALA-11917
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Michael Smith
>Priority: Major
>
> LLVM 7.0 adds a JITEventListener for integration with Linux perf in 
> createPerfJITEventListener(). We can consider replacing our customized 
> listener (CodegenSymbolEmitter) with it.
> [https://github.com/llvm/llvm-project/commit/376a3d3659e3ee5ea47517e3e43022f0306ecc74]
> [https://releases.llvm.org/7.0.0/docs/ReleaseNotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12212) Upgrade Maven to 3.9 to enable parallel dependency downloads

2023-07-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746625#comment-17746625
 ] 

ASF subversion and git services commented on IMPALA-12212:
--

Commit ee069687fcaa06c29404e2220ff577767d905a98 in impala's branch 
refs/heads/master from Laszlo Gaal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ee069687f ]

IMPALA-12212: Bump Maven to 3.9.2, pull dependencies in parallel

Maven 3.9.x offers a new dependency resolver, HttpClient, which allows
downloading project dependencies in parallel.

This patch bumps the Maven version installed by bootstrap_system.sh to
v3.9.2, and adds the flags enabling the new resolver to download
dependencies (including POM files) in parallel. Parallelism is set to
10 threads.

The flags are added to a project-specific Maven setting file in the
newly created java/.mvn directory. The settings file is added to the
RAT exclusion list in bin/rat_exclude_files.txt.

The --show-version flag is added for debugging purposes.

The same flags are added to the JAMM subproject as well.

The new resolver in Maven 3.9 has also changed the warning message
emitted for missing component checksums, so the new warning string
is added to the filter in bin/mvn-quiet.sh
Unfortunately Maven 3.9 has also changed the way it responds to missing
checksum files: the resolver now emits a stack trace when checksums
cannot be determined, and missing checksums are not explicitly ignored.

Detailed documentation for the new Maven resolver in Maven 3.9.0+ is
located at:
https://maven.apache.org/guides/mini/guide-resolver-transport.html
resolver configuration reference:
https://maven.apache.org/resolver/configuration.html

Tests:
- verified in a core-mode test run with Maven 3.9.2 installed
- verified in a local build using an earlier version of Maven
  to verify that the new default setting does not cause regressions
  with the old dependency resolver.

Change-Id: I75d05215effc724f5bd471646fb352f37443e185
Reviewed-on: http://gerrit.cloudera.org:8080/20142
Tested-by: Impala Public Jenkins 
Reviewed-by: Michael Smith 


> Upgrade Maven to 3.9 to enable parallel dependency downloads
> 
>
> Key: IMPALA-12212
> URL: https://issues.apache.org/jira/browse/IMPALA-12212
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Laszlo Gaal
>Assignee: Laszlo Gaal
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> Maven 3.9 introduced a new dependency resolver (the native HTTP 
> resolver)[1][2], which offers the option of traversing a project's dependency 
> tree in a breadth-first manner in addition to the old default depth-first 
> traversal[3]. This new mode also enables the parallel download of dependency 
> POMs (which had to happen in a serial, sequential way) with the old resolver.
> Impala should embrace the new download mechanism, as it could speed up the 
> dependency download phase of the frontend build significantly.
> [1] https://maven.apache.org/docs/3.9.0/release-notes.html, [2] 
> https://maven.apache.org/guides/mini/guide-resolver-transport.html
> [3] see the {{aether.dependencyCollector.impl}} option at 
> https://maven.apache.org/resolver/configuration.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12212) Upgrade Maven to 3.9 to enable parallel dependency downloads

2023-07-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12212.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> Upgrade Maven to 3.9 to enable parallel dependency downloads
> 
>
> Key: IMPALA-12212
> URL: https://issues.apache.org/jira/browse/IMPALA-12212
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 4.3.0
>Reporter: Laszlo Gaal
>Assignee: Laszlo Gaal
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> Maven 3.9 introduced a new dependency resolver (the native HTTP 
> resolver)[1][2], which offers the option of traversing a project's dependency 
> tree in a breadth-first manner in addition to the old default depth-first 
> traversal[3]. This new mode also enables the parallel download of dependency 
> POMs (which had to happen in a serial, sequential way) with the old resolver.
> Impala should embrace the new download mechanism, as it could speed up the 
> dependency download phase of the frontend build significantly.
> [1] https://maven.apache.org/docs/3.9.0/release-notes.html, [2] 
> https://maven.apache.org/guides/mini/guide-resolver-transport.html
> [3] see the {{aether.dependencyCollector.impl}} option at 
> https://maven.apache.org/resolver/configuration.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12305) Impala server hanging when processing DDL if CatalogD HA is enabled

2023-07-24 Thread Abhishek Rawat (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rawat updated IMPALA-12305:

Priority: Critical  (was: Major)

> Impala server hanging when processing DDL if CatalogD HA is enabled 
> 
>
> Key: IMPALA-12305
> URL: https://issues.apache.org/jira/browse/IMPALA-12305
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Critical
> Fix For: Impala 4.3.0
>
>
> In IMPALA-12286, catalogd re-generate its Catalog Service ID in JniCatalog 
> when it becomes active. But CatalogServiceCatalog is not updated when new 
> Catalog Service ID is generated. This causes coordinator hanging when 
> processing DDLs.
> In CatalogServer class, member variable is_active_ is not protected by mutex 
> catalog_lock_, and pending_topic_updates_ is not cleared when the catalogd 
> becomes active. It's possible catalog server sends pending catalog topic 
> updates with old Catalog Service ID then sends catalog topic updates with new 
> Catalog Service ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12310) test_delete_complextypes_mixed_files failed in exhaustive build

2023-07-24 Thread Jira
Zoltán Borók-Nagy created IMPALA-12310:
--

 Summary: test_delete_complextypes_mixed_files failed in exhaustive 
build
 Key: IMPALA-12310
 URL: https://issues.apache.org/jira/browse/IMPALA-12310
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


test_delete_complextypes_mixed_files failed in an exhaustive build.

h2. Error Message
query_test/test_iceberg.py:1235: in test_delete_complextypes_mixed_files 
unique_database) common/impala_test_suite.py:718: in run_test_case result = 
exec_fn(query, user=test_section.get('USER', '').strip() or None) 
common/impala_test_suite.py:656: in __exec_in_impala result = 
self.__execute_query(target_impalad_client, query, user=user) 
common/impala_test_suite.py:992: in __execute_query return 
impalad_client.execute(query, user=user) common/impala_connection.py:214: in 
execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:191: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:367: in __execute_query handle = 
self.execute_query_async(query_string, user=user) 
beeswax/impala_beeswax.py:361: in execute_query_async handle = 
self.__do_rpc(lambda: self.imp_service.query(query,)) 
beeswax/impala_beeswax.py:524: in __do_rpc raise 
ImpalaBeeswaxException(self.__build_error_message(b), b) E   
ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION:  EMESSAGE: AnalysisException: Could not 
resolve table reference: 'ice_complex_delete'

h2. Stacktrace

{noformat}
query_test/test_iceberg.py:1235: in test_delete_complextypes_mixed_files
unique_database)
common/impala_test_suite.py:718: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:656: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:992: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:214: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:191: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:367: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:361: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:524: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: Could not resolve table reference: 
'ice_complex_delete'
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12309) test_restart_statestore_query_resilience failed in exhaustive run

2023-07-24 Thread Jira
Zoltán Borók-Nagy created IMPALA-12309:
--

 Summary: test_restart_statestore_query_resilience failed in 
exhaustive run
 Key: IMPALA-12309
 URL: https://issues.apache.org/jira/browse/IMPALA-12309
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


test_restart_statestore_query_resilience failed in exhaustive run.

h2. Error Message
assert 5 == 3  +  where 5 = >()  +where > = 
.get_state  +  and   3 = QueryState.RUNNING

h2. Stacktrace

{noformat}
custom_cluster/test_restart_services.py:304: in 
test_restart_statestore_query_resilience
assert client.get_state(handle) == QueryState.RUNNING
E   assert 5 == 3
E+  where 5 = >()
E+where > = 
.get_state
E+  and   3 = QueryState.RUNNING
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12089) Be able to skip pushing down a subset of the predicates

2023-07-24 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-12089:
-
Labels: impala-iceberg performance  (was: impala-iceberg)

> Be able to skip pushing down a subset of the predicates
> ---
>
> Key: IMPALA-12089
> URL: https://issues.apache.org/jira/browse/IMPALA-12089
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Peter Rozsa
>Priority: Major
>  Labels: impala-iceberg, performance
>
> https://issues.apache.org/jira/browse/IMPALA-11701 introduced logic to skip 
> pushing down predicates to Impala scanners if they are already applied by 
> Iceberg and won't filter any further rows. This is an "all or nothing" 
> approach where we either skip pushing down all the predicates or we push down 
> all of them.
> As a more sophisticated approach we should be able to push down a subset of 
> the predicates to Impala Scan nodes. For this we should be able to map 
> Iceberg predicates (returned from residual()) to Impala predicates. This 
> might not be that trivial as Iceberg sometimes doesn't return the exact same 
> predicates as it received through planFiles(). E.g. the object ID might be 
> different making the mapping more difficult.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12136) Rewrite DELETE statements to TRUNCATE if possible

2023-07-24 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-12136:
-
Labels: impala-iceberg performance  (was: impala-iceberg)

> Rewrite DELETE statements to TRUNCATE if possible
> -
>
> Key: IMPALA-12136
> URL: https://issues.apache.org/jira/browse/IMPALA-12136
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg, performance
>
> If the user issues DELETE FROM t; to remove all rows from a table, we should 
> rewrite it to TRUNCATE TABLE t; as it is much more efficient in some cases.
> E.g., for Iceberg tables DELETE FROM t; would create delete files that 
> contain all existing rows. Then subsequent readers would have to read all 
> data files and delete files just to return an empty result set. Wherease 
> TRUNCATE TABLE t; just creates a new empty table snapshot.
> We'll need to investigate if it makes sense for Kudu tables as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12308) Implement DIRECTED distribution mode for Iceberg tables

2023-07-24 Thread Jira
Zoltán Borók-Nagy created IMPALA-12308:
--

 Summary: Implement DIRECTED distribution mode for Iceberg tables
 Key: IMPALA-12308
 URL: https://issues.apache.org/jira/browse/IMPALA-12308
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Frontend
Reporter: Zoltán Borók-Nagy


Currently there are two distribution modes for JOIN-operators:
* BROADCAST: RHS is delivered to all executors of LHS
* PARTITIONED: both LHS and RHS are shuffled across executors

We implement reading of an Iceberg V2 table (with position delete files) via an 
ANTI JOIN operator. LHS is the SCAN operator of the data records, RHS is the 
SCAN operator of the delete records. The delete record contain (file_path, pos) 
information of the deleted rows.

This means we can invent another distribution mode, just for Iceberg V2 tables 
with position deletes: DIRECTED distribution mode.

At scheduling we must save the information about data SCAN operators, i.e. on 
which nodes are they going to be executed. The LHS don't need to be shuffled 
over the network.
The delete records of RHS can use the scheduling information to transfer delete 
records to the hosts that process the corresponding data file.

This minimizes network communication.
We can also add further optimizations to the Iceberg V2 operator 
(IcebergDeleteNode):
* Compare the pointers of the file paths instead of doing string compare
* Each tuple in a rowbatch belong to the same file, and positions are in 
ascending order
** Onlyone lookup is needed from the Hash table
** We can add fast paths to skip testing the whole rowbatch (when the row 
batch's position range is outside of the delete position range)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11986) Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for Iceberg tables

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11986:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for 
> Iceberg tables
> ---
>
> Key: IMPALA-11986
> URL: https://issues.apache.org/jira/browse/IMPALA-11986
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Li Penglin
>Priority: Major
>  Labels: impala-iceberg, performance
>
> For Iceberg V1 and V2 tables without deletes:
> https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html
>  OPTIMIZE_PARTITION_KEY_SCANS optimizes the MIN(key_column), MAX(key_column), 
> and COUNT(DISTINCT key_column) by 'TBLS' table and 'PARTITION_KEY_VALS' 
> partition key column in the HMS metadata. For the Iceberg tables, its 
> partitioning stats is not stored in the HMS, but can be obtained through the 
> Iceberg API. We can optimize query performance for MIN(key_column), 
> MAX(key_column), or COUNT(DISTINCT key_column) by similar idea, but we should 
> make sure that 'Partition Transforms' is 'identity'.
> For non-partitioned columns, if min-max information is stored in Iceberg 
> meta, the MIN(column) and MAX(column) queries can also be optimized based on 
> this idea?
> But impala does not guarantee that the statistics for these non-partitioned 
> columns are complete, it's confusing things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12087) Performance improvements on Iceberg table queries

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12087:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Performance improvements on Iceberg table queries
> -
>
> Key: IMPALA-12087
> URL: https://issues.apache.org/jira/browse/IMPALA-12087
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg, performance
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12088) Make the predicate pushdown skipping improvement work with the count(*) query rewrite improvement

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12088:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Make the predicate pushdown skipping improvement work with the count(*) query 
> rewrite improvement
> -
>
> Key: IMPALA-12088
> URL: https://issues.apache.org/jira/browse/IMPALA-12088
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg, performance
>
> https://issues.apache.org/jira/browse/IMPALA-11802 introduced a query rewrite 
> for count(*) queries on Iceberg tables as a performance improvement. Later on 
> https://issues.apache.org/jira/browse/IMPALA-11701 introduced the capability 
> to skip pushing down predicates to Impala scanners when Iceberg applied it 
> and won't filter any further rows.
> This ticket is to make a connection between these 2 improvements: When we 
> skip pushing down predicates and there is no more predicates to push down and 
> this is a count(*) query then let's do the query rewrite from IMPALA-11802.
> One difficulty implementing this is that the query rewrite is done in the 
> analysis phase while the predicate pushdown decision is made in the planner 
> phase so might be a bit more complicated than to re-use the rewrite code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11238) Avoid the need for COMPUTE STATS for Iceberg tables

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11238:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Avoid the need for COMPUTE STATS for Iceberg tables
> ---
>
> Key: IMPALA-11238
> URL: https://issues.apache.org/jira/browse/IMPALA-11238
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg, performance
> Attachments: image-2022-04-12-21-07-45-357.png
>
>
> We still need to issue COMPUTE STATS for Iceberg tables to do proper planning.
> The main reason for it that Iceberg metadata lacks NDV information about 
> columns at the table level.
> There are plans in Iceberg to store HyperLogLog arrays for data files, so 
> once we have that we could use that information.
> Until that maybe we could use some heuristics from Iceberg metadata when 
> there is no precise NDV available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12173) Push down predicates on sorting columns to Iceberg

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12173:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Push down predicates on sorting columns to Iceberg
> --
>
> Key: IMPALA-12173
> URL: https://issues.apache.org/jira/browse/IMPALA-12173
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg, performance
>
> Currently we only push down predicates to Iceberg if any of them refers to a 
> partitioning column. We do this because invoking Iceberg's planFiles() API is 
> very expensive, so we only want to do that if we can expect greater benefits.
> We could also push down predicates if any of them refers to a sorting column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11655) Impala should set merge-on-read by default

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11655.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Impala should set merge-on-read by default
> --
>
> Key: IMPALA-11655
> URL: https://issues.apache.org/jira/browse/IMPALA-11655
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
>
> Similarly to HIVE-26596 Impala should set merge-on-read delete mode for V2 
> tables, unless otherwise specified:
> * during table creation with 'format-version'='2'
> * during alter table set tblproperties 'format-version'='2'
> We do so because in the foreseeable future Impala will only support 
> merge-on-read (on the write-side, on the read side copy-on-write is also 
> supported). Also, currently Hive only supports merge-on-read for DELETEs and 
> UPDATEs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11877) Add support for DELETE statements for Iceberg tables

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11877.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> Add support for DELETE statements for Iceberg tables
> 
>
> Key: IMPALA-11877
> URL: https://issues.apache.org/jira/browse/IMPALA-11877
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.3.0
>
>
> Add support for DELETE statements for Iceberg tables.
> We can do it based on the following design doc: 
> https://docs.google.com/document/d/1GuRiJ3jjqkwINsSCKYaWwcfXHzbMrsd3WEMDOB11Xqw/edit#heading=h.5bmfhbmb4qdk
> Limitations:
> * only support merge-on-read
> * only write position delete files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12299) Parallelize file listings of Iceberg tables on HDFS/Ozone

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12299:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Parallelize file listings of Iceberg tables on HDFS/Ozone
> -
>
> Key: IMPALA-12299
> URL: https://issues.apache.org/jira/browse/IMPALA-12299
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Peter Rozsa
>Priority: Major
>  Labels: impala-iceberg, performance
>
> *The followings only affect HDFS/Ozone where we need to contact the NameNode 
> to create file descriptors with block locations. On cloud object stores where 
> there are no block locations, we only need the Iceberg metadata to create the 
> file descriptors.*
> Currently we are doing one big recursive file listing on the table directory 
> to load all the files (with block locations as well) in an Iceberg table.
> Instead of this, we could look at the Iceberg metadata, identify the 
> partitions, then load the file descriptors in them in parallel.
> We cannot really reuse ParallelFileMetadataLoader in its current form as it 
> works on HdfsPartitions, and Iceberg tables are treated as non-partitioned 
> tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden 
> from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12298) Improve incremental load of Iceberg tables

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12298:
---
Labels: impala-iceberg performance  (was: impala-iceberg)

> Improve incremental load of Iceberg tables
> --
>
> Key: IMPALA-12298
> URL: https://issues.apache.org/jira/browse/IMPALA-12298
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg, performance
>
> *The followings mostly affect HDFS/Ozone where we need to contact the 
> NameNode to create file descriptors with block locations. On cloud object 
> stores where there are no block locations, we only need the Iceberg metadata 
> to create the file descriptors.*
> Currently we always reload all the metadata belonging to an Iceberg table.
> This means we recreate all the file descriptors even if only a few of them 
> have changed.
> We could check the amount of the newly added files, and if there's only a few 
> of them then we should only load the file descriptors for those one by one.
> We can fallback to a full reload if a significant amount of files have 
> changed, i.e. when it is better to use a recursive file listing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12307) test_75_percent_availability fails on object stores

2023-07-24 Thread Jira
Zoltán Borók-Nagy created IMPALA-12307:
--

 Summary: test_75_percent_availability fails on object stores
 Key: IMPALA-12307
 URL: https://issues.apache.org/jira/browse/IMPALA-12307
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


test_75_percent_availability fails on Ozone and S3.

The test expects the string "SCAN HDFS" to be found in the profile.
Instead of it there's "SCAN OZONE" and "SCAN S3".





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12307) test_75_percent_availability fails on object stores

2023-07-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-12307:
--

Assignee: Riza Suminto

> test_75_percent_availability fails on object stores
> ---
>
> Key: IMPALA-12307
> URL: https://issues.apache.org/jira/browse/IMPALA-12307
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
>
> test_75_percent_availability fails on Ozone and S3.
> The test expects the string "SCAN HDFS" to be found in the profile.
> Instead of it there's "SCAN OZONE" and "SCAN S3".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12243) Add support for DROP PARTITION for Iceberg tables

2023-07-24 Thread Peter Rozsa (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Rozsa reassigned IMPALA-12243:


Assignee: Peter Rozsa

> Add support for DROP PARTITION for Iceberg tables
> -
>
> Key: IMPALA-12243
> URL: https://issues.apache.org/jira/browse/IMPALA-12243
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Peter Rozsa
>Priority: Major
>  Labels: impala-iceberg
>
> Add support for DROP PARTITION for Iceberg tables.
> Users should be able to run statements like the followings:
> * alter table table_a drop partition (country = 'SG')
> * alter table table_a drop partition (identity(country) = 'SG')
> * alter table table_a drop partition (dt < '2023-01-01 00:00:00')
> * alter table table_a drop partition (year(dt) < 2023)
> * alter table table_a drop partition (year(dt) < 2023 and month(dt) < 6)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12299) Parallelize file listings of Iceberg tables on HDFS/Ozone

2023-07-24 Thread Peter Rozsa (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Rozsa reassigned IMPALA-12299:


Assignee: Peter Rozsa

> Parallelize file listings of Iceberg tables on HDFS/Ozone
> -
>
> Key: IMPALA-12299
> URL: https://issues.apache.org/jira/browse/IMPALA-12299
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Peter Rozsa
>Priority: Major
>  Labels: impala-iceberg
>
> *The followings only affect HDFS/Ozone where we need to contact the NameNode 
> to create file descriptors with block locations. On cloud object stores where 
> there are no block locations, we only need the Iceberg metadata to create the 
> file descriptors.*
> Currently we are doing one big recursive file listing on the table directory 
> to load all the files (with block locations as well) in an Iceberg table.
> Instead of this, we could look at the Iceberg metadata, identify the 
> partitions, then load the file descriptors in them in parallel.
> We cannot really reuse ParallelFileMetadataLoader in its current form as it 
> works on HdfsPartitions, and Iceberg tables are treated as non-partitioned 
> tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden 
> from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12306) Make TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map} more robust

2023-07-24 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-12306:
--

 Summary: Make 
TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map}
 more robust
 Key: IMPALA-12306
 URL: https://issues.apache.org/jira/browse/IMPALA-12306
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


The tests 
TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map}
 introduced by [IMPALA-12260|http://issues.apache.org/jira/browse/IMPALA-12260] 
were added to ensure we don't crash because of a use-after-free. However, 
use-after-free is undefined behaviour and does not guarantee a crash, so the 
tests don't necessarily crash without the fix of 
[IMPALA-12260|http://issues.apache.org/jira/browse/IMPALA-12260]. We should 
find a way to make these tests detect the conditions that lead to this 
use-after-free.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org