[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 06:54:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. IMPALA-10156: test_unmatched_schema should use unique_database This updates the test to use the unique_database fixture instead of trying to generate a unique table name on its own. Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Reviewed-on: http://gerrit.cloudera.org:8080/16758 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/query_test/test_scanners.py 1 file changed, 14 insertions(+), 18 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6689/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 10 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 05:46:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 10: Code-Review+2 Rebased on latest master. Carry forward +2 . -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 10 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 05:36:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7711/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 9 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 04:17:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 9: > Patch Set 8: > > > Patch Set 8: > > > > > Patch Set 8: > > > > > > > Patch Set 8: Verified-1 > > > > > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ > > > > > > There's 1 failure: > > > [gw1] FAILED > > > query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit > > > > > > However, on my desktop I ran query_test/test_exprs.py and all tests under > > > it passed. > > > > On Jenkins this test hit an OOM GC overhead limit exceeded: > > gw1] linux2 -- Python 2.7.16 > > /home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python > > query_test/test_exprs.py:176: in test_statement_expression_limit > > assert re.search(expected_err_re, str(err)) > > E assert None > > E+ where None = ('Exceeded the > > statement expression limit \\(25\\)\nStatement has .* expressions.', > > "ImpalaBeeswaxException:\n INNER EXCEPTION: > 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC > > overhead limit exceeded\n") > > E+where = re.search > > E+and "ImpalaBeeswaxException:\n INNER EXCEPTION: > 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC > > overhead limit exceeded\n" = str(ImpalaBeeswaxException()) > > Oh wait, there's actually another failure in FE: > > 23:47:52 [INFO] Running org.apache.impala.analysis.ParserTest > 23:47:52 [ERROR] Tests run: 98, Failures: 1, Errors: 0, Skipped: 0, Time > elapsed: 0.829 s <<< FAILURE! - in org.apache.impala.analysis.ParserTest > 23:47:52 [ERROR] TestGetErrorMsg(org.apache.impala.analysis.ParserTest) Time > elapsed: 0.005 s <<< FAILURE! > at > org.apache.impala.analysis.ParserTest.ParserError(ParserTest.java:77) > at > org.apache.impala.analysis.ParserTest.TestGetErrorMsg(ParserTest.java:3475) > > Will need to look into this one. PS9 fixes the ParserTest issue..just had to update the list of expected keywords in the test after a WHERE clause. Not yet sure about the other OOM failure since I cannot repro locally. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 9 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 03:58:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16723 to look at the new patch set (#9). Change subject: IMPALA-10314: Optimize planning time for simple limits .. IMPALA-10314: Optimize planning time for simple limits This patch optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row. This reduces the number of partitions processed by HdfsScanNode.computeScanRangeLocations() which, according to query profiling, has been the main contributor to the planning time especially for large number of partitions. Further, within each partition, we only consider the number of non-empty files that brings the total to N. This is an opt-in optimization. A new planner option OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further, if there's a WHERE clause, it must have an 'always_true' hint in order for the optimization to be considered. For example: set optimize_simple_limit = true; SELECT * FROM T WHERE /* +always_true */ LIMIT 10; If there are too many empty files in the partitions, it is possible that the query may produce fewer rows although those are still valid rows. Testing: - Added planner tests for the optimization - Ran query_test.py tests by enabling the optimize_simple_limit - Added an e2e test. Since result rows are non-deterministic, only simple count(*) query on top of subquery with limit was added. Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 17 files changed, 507 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/9 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 9 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: > Patch Set 8: > > > Patch Set 8: > > > > > Patch Set 8: Verified-1 > > > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ > > > > There's 1 failure: > > [gw1] FAILED > > query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit > > > > However, on my desktop I ran query_test/test_exprs.py and all tests under > > it passed. > > On Jenkins this test hit an OOM GC overhead limit exceeded: > gw1] linux2 -- Python 2.7.16 > /home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python > query_test/test_exprs.py:176: in test_statement_expression_limit > assert re.search(expected_err_re, str(err)) > E assert None > E+ where None = ('Exceeded the > statement expression limit \\(25\\)\nStatement has .* expressions.', > "ImpalaBeeswaxException:\n INNER EXCEPTION: 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC overhead > limit exceeded\n") > E+where = re.search > E+and "ImpalaBeeswaxException:\n INNER EXCEPTION: 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC overhead > limit exceeded\n" = str(ImpalaBeeswaxException()) Oh wait, there's actually another failure in FE: 23:47:52 [INFO] Running org.apache.impala.analysis.ParserTest 23:47:52 [ERROR] Tests run: 98, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.829 s <<< FAILURE! - in org.apache.impala.analysis.ParserTest 23:47:52 [ERROR] TestGetErrorMsg(org.apache.impala.analysis.ParserTest) Time elapsed: 0.005 s <<< FAILURE! at org.apache.impala.analysis.ParserTest.ParserError(ParserTest.java:77) at org.apache.impala.analysis.ParserTest.TestGetErrorMsg(ParserTest.java:3475) Will need to look into this one. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 03:24:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: > Patch Set 8: > > > Patch Set 8: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ > > There's 1 failure: > [gw1] FAILED > query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit > > However, on my desktop I ran query_test/test_exprs.py and all tests under it > passed. On Jenkins this test hit an OOM GC overhead limit exceeded: gw1] linux2 -- Python 2.7.16 /home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python query_test/test_exprs.py:176: in test_statement_expression_limit assert re.search(expected_err_re, str(err)) E assert None E+ where None = ('Exceeded the statement expression limit \\(25\\)\nStatement has .* expressions.', "ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: OutOfMemoryError: GC overhead limit exceeded\n") E+where = re.search E+and "ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: OutOfMemoryError: GC overhead limit exceeded\n" = str(ImpalaBeeswaxException()) -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 03:14:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: > Patch Set 8: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ There's 1 failure: [gw1] FAILED query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit However, on my desktop I ran query_test/test_exprs.py and all tests under it passed. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 03:09:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Nov 2020 02:48:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10258: Fixed flaky TestQueryRetries.test original query cancel
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16763 ) Change subject: IMPALA-10258: Fixed flaky TestQueryRetries.test_original_query_cancel .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7710/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3 Gerrit-Change-Number: 16763 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 02:08:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10312: bump timeout in test ddl queries are closed
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16762 ) Change subject: IMPALA-10312: bump timeout in test_ddl_queries_are_closed .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7709/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6 Gerrit-Change-Number: 16762 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 21 Nov 2020 01:58:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10258: Fixed flaky TestQueryRetries.test original query cancel
Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16763 Change subject: IMPALA-10258: Fixed flaky TestQueryRetries.test_original_query_cancel .. IMPALA-10258: Fixed flaky TestQueryRetries.test_original_query_cancel When TestQueryRetries.test_original_query_cancel was ran on s3 with query option spool_query_results enabled, the query was timeout before reaching the expected state. This patch double the timeout for the query when the test is running on S3 and double the timeout for query to reaching "FINISHED" state. For IMPALA-10109, test_retries_from_cancellation_pool did not trigger query-retry when one of impalad was killed. It seems that membership updating message was not received and processed by coordinator before reaching terminated state, hence the query-retry was not triggered. This patch reduce the heartbeat_frequency and max_missed_heartbeats so that statestore will take much less time to update membership when one impalad was killed so that coordinator could start query-retry. Testing: - Ran the two tests in a loop for more than 3 hours. The test failures did not happen. Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3 --- M tests/custom_cluster/test_query_retries.py 1 file changed, 12 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16763/1 -- To view, visit http://gerrit.cloudera.org:8080/16763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3 Gerrit-Change-Number: 16763 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10312: bump timeout in test ddl queries are closed
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16762 Change subject: IMPALA-10312: bump timeout in test_ddl_queries_are_closed .. IMPALA-10312: bump timeout in test_ddl_queries_are_closed This increases the timeout from 10s to 30s for waiting for the queries to be closed under the theory that the test failure is caused by random slowness. Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6 --- M tests/shell/test_shell_interactive.py 1 file changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/16762/1 -- To view, visit http://gerrit.cloudera.org:8080/16762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6 Gerrit-Change-Number: 16762 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-9050: fix TestScanRangeLengths params
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16761 ) Change subject: IMPALA-9050: fix TestScanRangeLengths params .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7708/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a Gerrit-Change-Number: 16761 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6688/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4238: make TestClientSsl more robust
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16760 ) Change subject: IMPALA-4238: make TestClientSsl more robust .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7707/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c884f76659005e7245a156ee33c249b86662b75 Gerrit-Change-Number: 16760 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 21 Nov 2020 01:15:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9050: fix TestScanRangeLengths params
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16761 Change subject: IMPALA-9050: fix TestScanRangeLengths params .. IMPALA-9050: fix TestScanRangeLengths params This test is only relevant from HDFS-based table formats. The option under test does not affect behaviour for Kudu or HBase. Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a --- M tests/query_test/test_scanners.py 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/16761/1 -- To view, visit http://gerrit.cloudera.org:8080/16761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a Gerrit-Change-Number: 16761 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4238: make TestClientSsl more robust
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16760 Change subject: IMPALA-4238: make TestClientSsl more robust .. IMPALA-4238: make TestClientSsl more robust This changes the test to wait until it is executing in the backend before trying to cancel it. This should remove planning time as a variable that might cause the test to be flaky (e.g. if planning is slow on S3 because of the time taken to list files). Also dump the /queries debug page when the assertion is hit to aid debugging. Change-Id: I0c884f76659005e7245a156ee33c249b86662b75 --- M tests/custom_cluster/test_client_ssl.py 1 file changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16760/1 -- To view, visit http://gerrit.cloudera.org:8080/16760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0c884f76659005e7245a156ee33c249b86662b75 Gerrit-Change-Number: 16760 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 00:48:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16758 ) Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7706/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 21 Nov 2020 00:24:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16758 Change subject: IMPALA-10156: test_unmatched_schema should use unique_database .. IMPALA-10156: test_unmatched_schema should use unique_database This updates the test to use the unique_database fixture instead of trying to generate a unique table name on its own. Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c --- M tests/query_test/test_scanners.py 1 file changed, 14 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/16758/1 -- To view, visit http://gerrit.cloudera.org:8080/16758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c Gerrit-Change-Number: 16758 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9 PS12, Line 9: This patch adds the logic to utilize min/max stats > Does this patch also leads to utilizing min/max filters per-row, similarly I think this would be a good thing to do (I think the patch does this automatically). One question I have is, if this is the case, whether the min/max filter is evaluated before the bloom filter or vice-versa. That might have some perf implications. It's not clear to me which order is better or whether it really matters. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Nov 2020 23:17:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 12: (3 comments) I have some high level comments. I plan to go through the patch in more detail later. http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9 PS12, Line 9: This patch adds the logic to utilize min/max stats Does this patch also leads to utilizing min/max filters per-row, similarly to bloom filters? http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549 PS12, Line 549: if ( eval_min_max ) { I am wondering if it is possible to handle min/max runtime filters more similarly to existing stat filtering. A possible idea is to split the new filter do distinct data_min>join_max and data_maxhttps://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-stats.cc#L87 http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862 PS12, Line 862: TYPE_DATETIME You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impala -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Nov 2020 22:33:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. IMPALA-10152: Add support for Iceberg HiveCatalog HiveCatalog is one of Iceberg's catalog implementations. It uses the Hive metastore and it is the recommended catalog implementation when the table data is stored in object stores like S3. This commit updates the Iceberg version to a newer one, and it also retrieves Iceberg from the CDP distribution because that version of Iceberg is built against Hive 3 (Impala is only compatible with Hive 3). This commit makes HiveCatalog the default Iceberg catalog in Impala because it can be used in more environments (e.g. cloud stores), and it is more featureful. Also, other engines that store their table metadata in HMS will probably use HiveCatalog as well. Tables stored in HiveCatalog are similar to Kudu tables with HMS integration, i.e. modifying an Iceberg table via the Iceberg APIs also modifies the HMS table. So in CatalogOpExecutor we handle such Iceberg tables similarly to integrated Kudu tables. Testing: * Added e2e tests for creating, writing, and altering Iceberg tables * Added SHOW CREATE TABLE tests Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Reviewed-on: http://gerrit.cloudera.org:8080/16721 Reviewed-by: wangsheng Tested-by: Impala Public Jenkins --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 16 files changed, 577 insertions(+), 92 deletions(-) Approvals: wangsheng: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 8 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16756 ) Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16756/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16756/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build nit: we usually put : after jira -- To view, visit http://gerrit.cloudera.org:8080/16756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Gerrit-Change-Number: 16756 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 20:57:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16756 ) Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Gerrit-Change-Number: 16756 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 20:46:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 20:39:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16756 ) Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7704/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Gerrit-Change-Number: 16756 Gerrit-PatchSet: 1 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 20:28:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build
Qifan Chen has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16756 ) Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build .. IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build This patch skips test_stats_extrapolation for erasure code builds. The reason is that an extra erasure code information line can be included in the scan explain section when a hdfs table is erasure coded. This makes the explain output different between a normal build and an erasure code build. A new reason 'contain_full_explain' is added to SkipIfEC to facilitate this. Testing: Ran erasure coding version of the EE and CLUSTER tests. Ran core tests Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 --- M tests/common/skip.py M tests/metadata/test_stats_extrapolation.py 2 files changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16756/2 -- To view, visit http://gerrit.cloudera.org:8080/16756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Gerrit-Change-Number: 16756 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [WIP] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build
Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16756 Change subject: [WIP] IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build .. [WIP] IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding build This patch skips test_stats_extrapolation for erasure code builds. The reason is that an extra information line can be included in the scan explain section when a hdfs table is erasure coded, which makes the explain output non-deterministic. A new reason 'contain_full_explain' is added to SkipIfEC to facilitate this. Testing: Ran erasure coding version of the EE and CLUSTER tests. Ran core tests Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 --- M tests/common/skip.py M tests/metadata/test_stats_extrapolation.py 2 files changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16756/1 -- To view, visit http://gerrit.cloudera.org:8080/16756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Gerrit-Change-Number: 16756 Gerrit-PatchSet: 1 Gerrit-Owner: Qifan Chen
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7703/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 19:00:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7702/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 18:55:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 ) Change subject: IMPALA-10314: Optimize planning time for simple limits .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup@3115 PS5, Line 3115: // clause. An attempt was made to set this for individual exprs > I guess this is a bit limiting in the it applies only to the whole where cl That was my original attempt too..setting the hint at the predicate level but when I was running into shift/reduce conflicts. I gave it another try today with a few variations and still could not get it to work. I have left a comment in the code about this. For reference, here's one change I tried: expr ::= non_pred_expr:e {: RESULT = e; :} | opt_plan_hints:pred_hints predicate:p {: p.setPredicateHints(pred_hints); RESULT = p; :}; This generated quite a few conflicts..example: Warning : *** Shift/Reduce conflict found in state #253 between opt_plan_hints ::= (*) and case_expr ::= (*) KW_CASE expr case_when_clause_list case_else_clause KW_END and case_expr ::= (*) KW_CASE case_when_clause_list case_else_clause KW_END under symbol KW_CASE Resolved in favor of shifting. Warning : *** Shift/Reduce conflict found in state #253 between opt_plan_hints ::= (*) and cast_expr ::= (*) KW_CAST LPAREN expr KW_AS type_def cast_format_val RPAREN under symbol KW_CAST Resolved in favor of shifting. .. ... http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java File fe/src/main/java/org/apache/impala/analysis/Predicate.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java@30 PS5, Line 30: { > maybe hasAlwaysTrueHint_ just to make it crystal-clear that it's not actual Changed this and the names of the setter/getter also. http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@869 PS5, Line 869: if ((fsHasBlocks && fd.getNumFileBlocks() == 0) > nit: use braces for multi-line if Done http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870 PS5, Line 870: d.getFileLength() < > We already had to deal with a similar issue here I will follow up with the doc writer on this. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Nov 2020 18:47:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16723 to look at the new patch set (#7). Change subject: IMPALA-10314: Optimize planning time for simple limits .. IMPALA-10314: Optimize planning time for simple limits This patch optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row. This reduces the number of partitions processed by HdfsScanNode.computeScanRangeLocations() which, according to query profiling, has been the main contributor to the planning time especially for large number of partitions. Further, within each partition, we only consider the number of non-empty files that brings the total to N. This is an opt-in optimization. A new planner option OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further, if there's a WHERE clause, it must have an 'always_true' hint in order for the optimization to be considered. For example: set optimize_simple_limit = true; SELECT * FROM T WHERE /* +always_true */ LIMIT 10; If there are too many empty files in the partitions, it is possible that the query may produce fewer rows although those are still valid rows. Testing: - Added planner tests for the optimization - Ran query_test.py tests by enabling the optimize_simple_limit - Added an e2e test. Since result rows are non-deterministic, only simple count(*) query on top of subquery with limit was added. Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 16 files changed, 505 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/7 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16723 to look at the new patch set (#6). Change subject: IMPALA-10314: Optimize planning time for simple limits .. IMPALA-10314: Optimize planning time for simple limits This patch optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row. This reduces the number of partitions processed by HdfsScanNode.computeScanRangeLocations() which, according to query profiling, has been the main contributor to the planning time especially for large number of partitions. Further, within each partition, we only consider the number of non-empty files that brings the total to N. This is an opt-in optimization. A new planner option OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further, if there's a WHERE clause, it must have an 'always_true' hint in order for the optimization to be considered. For example: set optimize_simple_limit = true; SELECT * FROM T WHERE /* +always_true */ LIMIT 10; If there are too many empty files in the partitions, it is possible that the query may produce fewer rows although those are still valid rows. Testing: - Added planner tests for the optimization - Ran query_test.py tests by enabling the optimize_simple_limit - Added an e2e test. Since result rows are non-deterministic, only simple count(*) query on top of subquery with limit was added. Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 16 files changed, 505 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/6 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16755 ) Change subject: IMPALA-9856: Enable result spooling by default. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7701/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Gerrit-Change-Number: 16755 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 18:27:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16755 ) Change subject: IMPALA-9856: Enable result spooling by default. .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@873 PS1, Line 873: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py File tests/custom_cluster/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@38 PS1, Line 38: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@623 PS1, Line 623: flake8: E203 whitespace before ':' http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@630 PS1, Line 630: _ flake8: E501 line too long (98 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Gerrit-Change-Number: 16755 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 18:06:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16755 Change subject: IMPALA-9856: Enable result spooling by default. .. IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive shell/test_shell_client.py::TestShellClient::test_fetch_size Disable result spooling to avoid crash / silent error: custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_scanners.py::TestWideRow::test_wide_row query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_large_rows query_test/test_udfs.py::TestUdfExecution::test_java_udfs query_test/test_udfs.py::TestUdfTargeted::test_udf_profile Further investigation need to be done to address the last category. Testing: - Pass exhaustive tests. Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 --- M common/thrift/ImpalaInternalService.thrift M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Csaba Ringhofer has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. List all file formats that a HdfsScanNode needs to process in any fragment instance. It is possible that some file formats will not be needed in all fragment instances. This is a step towards sharing codegen between different impala backends. Using the file formats provided in the thrift file, a backend can codegen code for file formats that are not needed in its own process but are needed in other fragment instances running on other backends, and the resulting binary can be shared between multiple backends. Codegenning for file formats will be done based on the thrift message and not on what is needed for the actual backend. This leads to some extra work in case a file format is not needed for the current backend and codegen sharing is not available (at this point it is not implemented). However, the overall number of such cases is low. Also adding the file formats to the node's explain string at level 3. Testing: - Added tests to verify that the file formats are present in the explain string at level 3. Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Reviewed-on: http://gerrit.cloudera.org:8080/16728 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer --- M be/src/exec/hdfs-scan-node-base.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 5 files changed, 52 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Verified Csaba Ringhofer: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 17:54:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 17:25:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. IMPALA-10346: Rename Iceberg test tables' name with specific cases We used some unrecognized table names in Iceberg related test cases, such as iceberg_test1/iceberg_test2 and so on, which resulted in poor readability. So we better rename these Iceberg test tables' name by specific cases. Testing: - Renamed tables' name in iceberg-create.test - Renamed tables' name in iceberg-alter.test Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Reviewed-on: http://gerrit.cloudera.org:8080/16753 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test 2 files changed, 62 insertions(+), 62 deletions(-) Approvals: Zoltan Borok-Nagy: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 16:21:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7700/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 15:36:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: Thanks for the review! After +2 we can run the verify job with DRY_RUN=false, so on success the job submits the patch. -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 15:18:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16656 Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions These functions can be used to get cardinality estimates of data using CPC algorithm from Apache DataSketches. ds_cpc_sketch() receives a dataset, e.g. a column from a table, and returns a serialized CPC sketch in string format. This can be written to a table or be fed directly to ds_cpc_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the CPC sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' CPC see: http://datasketches.apache.org/docs/CPC/CPC.html Figures-of-Merit Comparison of the HLL and CPC Sketches see: https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch_parquet.lineitem to compare perfomance with ndv(). Depending on data characteristics ndv() appears 2x-3x faster. CPC gives closer estimate than current ndv(). CPC is more accurate than HLL in some cases Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/cpc_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test M tests/query_test/test_datasketches.py 12 files changed, 398 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/4 -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6686/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: Code-Review+2 Thanks for this new feature, Zoltan, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7699/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Nov 2020 14:45:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds the logic to utilize min/max stats for Parquet row groups or pages to skip these entities when they don't qualify an equi-join predicate. A new class of predicates called overlap predicates is introduced to aid in the determination of whether a Parquet row group or a page overlap with the a range computed from the hash join. If not, then the entire Parquet row group or the page are skipped. The new class of predicates co-exist with the existing min/max conjuncts that are introduced based on the local scan predicates. Both classes of predicates can work individually or togther with each other. The overlap predicates are evaualted after the existing min/max conjuncts. To be done: 1. Handle all data types; 2. Unit/performance testing; 3. Core testing. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java 14 files changed, 386 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/12 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7698/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 13:40:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 7: (2 comments) Thanks for the quick review! http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84 PS6, Line 84: // Internal Iceberg table property that specifies the absolute path of the current : // table metadata. > We may add some explain here: this propery is only valid for 'hive.catalog' Done http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127 PS6, Line 127: CREATE TABLE iceberg_hadoop_cat_with_metadata_locacti > Shall we add a test for HadoopCatalog here? Done -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 13:18:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16721 to look at the new patch set (#7). Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. IMPALA-10152: Add support for Iceberg HiveCatalog HiveCatalog is one of Iceberg's catalog implementations. It uses the Hive metastore and it is the recommended catalog implementation when the table data is stored in object stores like S3. This commit updates the Iceberg version to a newer one, and it also retrieves Iceberg from the CDP distribution because that version of Iceberg is built against Hive 3 (Impala is only compatible with Hive 3). This commit makes HiveCatalog the default Iceberg catalog in Impala because it can be used in more environments (e.g. cloud stores), and it is more featureful. Also, other engines that store their table metadata in HMS will probably use HiveCatalog as well. Tables stored in HiveCatalog are similar to Kudu tables with HMS integration, i.e. modifying an Iceberg table via the Iceberg APIs also modifies the HMS table. So in CatalogOpExecutor we handle such Iceberg tables similarly to integrated Kudu tables. Testing: * Added e2e tests for creating, writing, and altering Iceberg tables * Added SHOW CREATE TABLE tests Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 16 files changed, 577 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/7 -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7697/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 12:16:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 11:57:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6685/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Nov 2020 11:55:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/16728 ) Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. .. IMPALA-10332: Add file formats to HdfsScanNode's thrift representation. List all file formats that a HdfsScanNode needs to process in any fragment instance. It is possible that some file formats will not be needed in all fragment instances. This is a step towards sharing codegen between different impala backends. Using the file formats provided in the thrift file, a backend can codegen code for file formats that are not needed in its own process but are needed in other fragment instances running on other backends, and the resulting binary can be shared between multiple backends. Codegenning for file formats will be done based on the thrift message and not on what is needed for the actual backend. This leads to some extra work in case a file format is not needed for the current backend and codegen sharing is not available (at this point it is not implemented). However, the overall number of such cases is low. Also adding the file formats to the node's explain string at level 3. Testing: - Added tests to verify that the file formats are present in the explain string at level 3. Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d --- M be/src/exec/hdfs-scan-node-base.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 5 files changed, 52 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16728/9 -- To view, visit http://gerrit.cloudera.org:8080/16728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d Gerrit-Change-Number: 16728 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 11:47:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 6: (2 comments) Thanks for a quick turnaround, just two nits. http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84 PS6, Line 84: // Internal Iceberg table property that specifies the absolute path of the current : // table metadata. We may add some explain here: this propery is only valid for 'hive.catalog' or 'HiveCatalog' http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127 PS6, Line 127: CREATE TABLE iceberg_hive_cat_with_metadata_locaction Shall we add a test for HadoopCatalog here? -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 11:22:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7696/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 11:17:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6684/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 10:56:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. Patch Set 2: (2 comments) Thanks for review! http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@7 PS1, Line 7: cases > nit: probably the word 'cases' fits better here Done http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@10 PS1, Line 10: iceberg_test > nit: iceberg_test1 Done -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 10:56:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases
wangsheng has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific cases .. IMPALA-10346: Rename Iceberg test tables' name with specific cases We used some unrecognized table names in Iceberg related test cases, such as iceberg_test1/iceberg_test2 and so on, which resulted in poor readability. So we better rename these Iceberg test tables' name by specific cases. Testing: - Renamed tables' name in iceberg-create.test - Renamed tables' name in iceberg-alter.test Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test 2 files changed, 62 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/16753/2 -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific situations .. Patch Set 1: Code-Review+2 (2 comments) Thanks for making the tests more readable! http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@7 PS1, Line 7: situations nit: probably the word 'cases' fits better here http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@10 PS1, Line 10: iceberg_tes1 nit: iceberg_test1 -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Nov 2020 10:25:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7695/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 10:05:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7694/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 10:04:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 6: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419 PS4, Line 419: cebergTable.METAD > This table property is generated by HiveCatalog, I'm curious about: Good point! 'metadata_location' is internal to Iceberg so we shouldn't allow users modifying it. Updated the code accordingly. http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65 PS4, Line 65: return hiveCatalog_.createTable(identifier, schema, spec, location, properties); : } > nits: One line is ok, unnecessary for two lines. Done http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242 PS4, Line 242:* Get TIcebergFileFormat from a string, usually from table properties. :* > Maybe we need add a comment here, since we change the default value to 'PAR Updated the code a bit. Now this method is returning PARQUET when 'format' is null. Format can be null when the table was created by other engines. And returning null when the format string is invalid. -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 09:45:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16721 to look at the new patch set (#6). Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. IMPALA-10152: Add support for Iceberg HiveCatalog HiveCatalog is one of Iceberg's catalog implementations. It uses the Hive metastore and it is the recommended catalog implementation when the table data is stored in object stores like S3. This commit updates the Iceberg version to a newer one, and it also retrieves Iceberg from the CDP distribution because that version of Iceberg is built against Hive 3 (Impala is only compatible with Hive 3). This commit makes HiveCatalog the default Iceberg catalog in Impala because it can be used in more environments (e.g. cloud stores), and it is more featureful. Also, other engines that store their table metadata in HMS will probably use HiveCatalog as well. Tables stored in HiveCatalog are similar to Kudu tables with HMS integration, i.e. modifying an Iceberg table via the Iceberg APIs also modifies the HMS table. So in CatalogOpExecutor we handle such Iceberg tables similarly to integrated Kudu tables. Testing: * Added e2e tests for creating, writing, and altering Iceberg tables * Added SHOW CREATE TABLE tests Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 16 files changed, 568 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/6 -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16721 to look at the new patch set (#5). Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. IMPALA-10152: Add support for Iceberg HiveCatalog HiveCatalog is one of Iceberg's catalog implementations. It uses the Hive metastore and it is the recommended catalog implementation when the table data is stored in object stores like S3. This commit updates the Iceberg version to a newer one, and it also retrieves Iceberg from the CDP distribution because that version of Iceberg is built against Hive 3 (Impala is only compatible with Hive 3). This commit makes HiveCatalog the default Iceberg catalog in Impala because it can be used in more environments (e.g. cloud stores), and it is more featureful. Also, other engines that store their table metadata in HMS will probably use HiveCatalog as well. Tables stored in HiveCatalog are similar to Kudu tables with HMS integration, i.e. modifying an Iceberg table via the Iceberg APIs also modifies the HMS table. So in CatalogOpExecutor we handle such Iceberg tables similarly to integrated Kudu tables. Testing: * Added e2e tests for creating, writing, and altering Iceberg tables * Added SHOW CREATE TABLE tests Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 --- M bin/impala-config.sh M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 16 files changed, 568 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/5 -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16753 ) Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific situations .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7693/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Nov 2020 09:19:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations
wangsheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16753 Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific situations .. IMPALA-10346: Rename Iceberg test tables' name with specific situations We used some unrecognized table names in Iceberg related test cases, such as iceberg_tes1/iceberg_test2 and so on, which resulted in poor readability. So we better rename these Iceberg test tables' name by specific situations. Testing: - Renamed tables' name in iceberg-create.test - Renamed tables' name in iceberg-alter.test Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 --- M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test 2 files changed, 62 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/16753/1 -- To view, visit http://gerrit.cloudera.org:8080/16753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5 Gerrit-Change-Number: 16753 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng
[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16721 ) Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419 PS4, Line 419: metadata_location This table property is generated by HiveCatalog, I'm curious about: 1. Can we set this table property when creating table? If not, I think we need to add some check in code; 2. Can we alter table to set this table property? If not, we also need to add some check in code; 3. Maybe we should define a static variable in IcebergTable.java, just like ICEBERG_CATALOG, and we can use a reference here. As far as I know, 'metadata_location' is refer to a metadata file's absolute path, and maybe we cannot modify this property manually. http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65 PS4, Line 65: return hiveCatalog_.createTable(identifier, schema, spec, location, : properties); nits: One line is ok, unnecessary for two lines. http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242 PS4, Line 242:* Get TIcebergFileFormat from a string, usually from table properties :*/ Maybe we need add a comment here, since we change the default value to 'PARQUET' to replace 'null' -- To view, visit http://gerrit.cloudera.org:8080/16721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197 Gerrit-Change-Number: 16721 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 20 Nov 2020 07:59:49 + Gerrit-HasComments: Yes