[Impala-ASF-CR] IMPALA-10388: [DOCS] add limitations on mask functions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16861 ) Change subject: IMPALA-10388: [DOCS] add limitations on mask functions .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/16861/1/docs/topics/impala_authorization.xml File docs/topics/impala_authorization.xml: http://gerrit.cloudera.org:8080/#/c/16861/1/docs/topics/impala_authorization.xml@693 PS1, Line 693: Changes in Hive UDF implemented through "GenericUDF" supports mask functions. Even though : Impala users can call Hive UDFs, Impala does not yet support new Hive UDFs based on the : GenericUDF class. Due to the lack of a corresponding framework for GenericUDF in Impala, : this release introduced some of these mask functions through overloads. The first sentense make me confused. The GenericUDF framework and the mask functions in Hive have been implemented for years. Not sure what "Changes" means. I think this explains the limitation better: The mask functions in Hive are GenericUDFs. However, Impala currently doesn't support Hive GenericUDFs, so you can't use Hive's mask functions in Impala. Impala has builtin mask functions. They are implemented by overloads. Thus, when using mask functions, not all parameter combinations are supported in Impala. I'm not a native English speaker. Please help to correct the grammar mistakes and polish the content. :) http://gerrit.cloudera.org:8080/#/c/16861/1/docs/topics/impala_authorization.xml@697 PS1, Line 697: The following list includes all the overloads that are introduced in this release. This is not a release doc, but a doc that will be used in all releases. I don't think we should mentione "in this release" here. BTW, the mask functions are added in Impala-3.4, not the comming release version (4.0). Can you update this too: The following list includes all the overloads that are implemented. http://gerrit.cloudera.org:8080/#/c/16861/1/docs/topics/impala_authorization.xml@710 PS1, Line 710: The function mask(FLOAT) is not implemented in Impala. Do not use Redact(MASK) on : float/double/decimal types as Hive always returns NULL for them since it is not redact. : You can use a more efficient Nullify policy (MASK_NULL) for the same purpose. This is just an example explaining the overload, mask(FLOAT), is currently missing in Impala. And then provides a work around. I think we can update this paragraph to As an example, the function mask(FLOAT) is currently not implemented in Impala. Do not use Redact(MASK) on float/double/decimal types as Hive always returns NULL for them so it is not redact actually. If you do want to mask float/double/decimal to NULLs, you can use a more efficient Nullify policy (MASK_NULL) for the same purpose. -- To view, visit http://gerrit.cloudera.org:8080/16861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37f0bcf4cf586cc5cfd03e4df68443967b6bb88f Gerrit-Change-Number: 16861 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Dec 2020 07:32:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6789/ -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 17 Dec 2020 06:19:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10211 (Part 1): Add support for role-related statements
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16837 ) Change subject: IMPALA-10211 (Part 1): Add support for role-related statements .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6791/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 Gerrit-Change-Number: 16837 Gerrit-PatchSet: 7 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Dec 2020 05:17:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 14 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 17 Dec 2020 03:16:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6790/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 14 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 17 Dec 2020 03:16:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 13: Code-Review+2 Thanks for modify the code -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 17 Dec 2020 03:15:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10336: Coordinator return incorrect error to client
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16849 ) Change subject: IMPALA-10336: Coordinator return incorrect error to client .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7875/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75f252e43006c6ff6980800e3254672de396b318 Gerrit-Change-Number: 16849 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 17 Dec 2020 03:14:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10336: Coordinator return incorrect error to client
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16849 ) Change subject: IMPALA-10336: Coordinator return incorrect error to client .. IMPALA-10336: Coordinator return incorrect error to client Due to race condition, coordinator could set execution status as RPC aborted due to cancellation. This internal error should not be returned to client. This patch fixed the issue by setting the backend status as CANCELLED instead of ABORTED if the exec RPC was aborted due to cancellation. Testing: - Manual tests Since this is a racy bug, I could only reproduce the situation by adding some artificial delays in 3 places: QueryExecMgr.StartQuery(), Coordinator.UpdateBackendExecStatus(), and Coordinator::StartBackendExec() when running test case test_scanners.py::TestOrc::test_type_conversions_hive3. Verified that the issue did not happen after applying this patch by running test_scanners.py::TestOrc::test_type_conversions_hive3 in a loop for hours. - Passed exhausive test. Change-Id: I75f252e43006c6ff6980800e3254672de396b318 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h 2 files changed, 15 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16849/3 -- To view, visit http://gerrit.cloudera.org:8080/16849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I75f252e43006c6ff6980800e3254672de396b318 Gerrit-Change-Number: 16849 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10336: Coordinator return incorrect error to client
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16849 ) Change subject: IMPALA-10336: Coordinator return incorrect error to client .. Patch Set 2: Right. I could reproduce the situation by adding some artificial delays in 3 places: QueryExecMgr.StartQuery(), Coordinator.UpdateBackendExecStatus(), and Coordinator::StartBackendExec() when running test case test_scanners.py::TestOrc::test_type_conversions_hive3. -- To view, visit http://gerrit.cloudera.org:8080/16849 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75f252e43006c6ff6980800e3254672de396b318 Gerrit-Change-Number: 16849 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 17 Dec 2020 02:50:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok The --ldap_passwords_in_clear_ok flag was originally intended to allow configurations where Impala connects to LDAP without SSL, for testing purposes. Since then, two other uses of the flag have been added: 1) for controlling whether cookies include the 'Secure' attribute and 2) for controlling whether the webserver allows LDAP auth to be enabled if SSL isn't. Some use cases may prefer to control these values separately, so this patch separates them into three different flags. Testing: - Updated existing tests that use --ldap_passwords_in_clear_ok Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Reviewed-on: http://gerrit.cloudera.org:8080/16829 Reviewed-by: Impala Public Jenkins Tested-by: Thomas Tauber-Marshall --- M be/src/rpc/authentication-util.cc M be/src/util/webserver-test.cc M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapJdbcTest.java M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java 5 files changed, 19 insertions(+), 12 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Patch Set 5: Verified+1 Failed again due to IMPALA-10398. Since everything else is passing and this is low-risk I'm going to go ahead and push it in -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 17 Dec 2020 01:55:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Thomas Tauber-Marshall has removed a vote on this change. Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10211 (Part 1): Add support for role-related statements
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16837 ) Change subject: IMPALA-10211 (Part 1): Add support for role-related statements .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7874/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 Gerrit-Change-Number: 16837 Gerrit-PatchSet: 7 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Dec 2020 01:08:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7873/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 17 Dec 2020 00:59:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10211 (Part 1): Add support for role-related statements
Fang-Yu Rao has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16837 ) Change subject: IMPALA-10211 (Part 1): Add support for role-related statements .. IMPALA-10211 (Part 1): Add support for role-related statements This patch adds the support for the following role-related statements. 1. CREATE ROLE . 2. DROP ROLE . 3. GRANT ROLE TO GROUP . 4. REVOKE ROLE FROM GROUP . 5. GRANT ON TO ROLE . 6. REVOKE ON FROM ROLE . 7. SHOW GRANT ROLE ON . 8. SHOW ROLES. 9. SHOW CURRENT ROLES. 10. SHOW ROLE GRANT GROUP . To support the first 4 statements, we implemented the methods of createRole()/dropRole(), and grantRoleToGroup()/revokeRoleFromGroup() with their respective API calls provided by Ranger. To support the 5th and 6th statements, we modified createGrantRevokeRequest() so that the cases in which the grantee or revokee is a role could be processed. We slightly extended getPrivileges() so as to include the case when the principal is a role for the 7th statement. For the last 3 statements, to make Impala's behavior consistent with that when Sentry was the authorization provider, we based our implementation on SentryImpaladAuthorizationManager#getRoles() at https://gerrit.cloudera.org/c/15833/8/fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java, which was removed in IMPALA-9708 when we dropped the support for Sentry. To test the implemented functionalities, we based our test cases on those at https://gerrit.cloudera.org/c/15833/8/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test. We note that before our tests could be automatically run in a Kerberized environment (IMPALA-9360), in order to run the statements of CREATE/DROP ROLE , GRANT/REVOKE ROLE TO/FROM GROUP , and SHOW ROLES, we revised security-applicationContext.xml, one of the files needed when the Ranger server is started, so that the corresponding API calls could be performed in a non-Kerberized environment. During the process of adding test cases to grant_revoke.test, we found the following differences in Impala's behavior between the case when Ranger is the authorization provider and that when Sentry is the authorization provider. Specifically, we have the following two major differences. 1. Before dropping a role in Ranger, we have to remove all the privileges granted to the role in advance, which is not the case when Sentry is the authorization provider. 2. The resource has to be specified for the statement of SHOW GRANT ROLE ON , which is different when Sentry is the authorization provider. This could be partly due to the fact that there is no API provided by Ranger that allows Impala to directly retrieve the list of all privileges granted to a specified role. Due to the differences in Impala's behavior described above, we had to revise the test cases in grant_revoke.test accordingly. On the other hand, to include as many test cases that were in the original grant_revoke.test as possible, we had to explicitly add the test section of 'USER' to specify the connecting user to Impala for some queries that require the connecting user to be a Ranger administrator, e.g., CREATE/DROP ROLE and GRANT/REVOKE TO/FROM GROUP . The user has to be 'admin' in the current grant_revoke.test, whereas it could be the default user 'getuser()' in the original grant_revoke.test because previously 'getuser()' was also a Sentry administrator. Moreover, for some test cases, we had to explicitly alter the owner of a resource in the original grant_revoke.test when we would like to prevent the original owner of the resource, e.g., the creator of the resource, from accessing the resource since the original grant_revoke.test was run without object ownership being taken into consideration. Testing: - Briefly verified that the implemented statements work as expected in a Kerberized cluster. - Added the decorator of @pytest.mark.execute_serially to each test in test_ranger.py since we have observed that in some cases, e.g., if we are only running the E2E tests in the Jenkins environment, some tests do not seem to be executed sequentially. - Verified that test_ranger.py passes in a local development environment. Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 --- M bin/create-test-configuration.sh M fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java M fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java M fe/src/main/java/org/apache/impala/authorization/ranger/RangerUtil.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationTestBase.java A testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test M tests/authorization/test_ranger.py 7 files changed, 1,703 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/16837/7 -- To view,
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6789/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 17 Dec 2020 00:44:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16881 to look at the new patch set (#6). Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. IMPALA-9865: part 2/2: add verbosity to profile tool Adds a --profile_verbosity option for impala-profile-tool with the following levels: * 0: minimal * 1: legacy - matches old output, this is the default still * 2: default - basic descriptive stats, used for V2 profile. * 3: extended * 4: full Use the profile version in impala-profile-tool to dump the more verbose output for the V2 profile while preserving the same output for the legacy profile. Reduce verbosity of v2 profile output - only include mean/min/max by default. I intend to refine the output at the different verbosity levels for the v2 profiles further as part of IMPALA-9382, it is still fairly noisy. Fix output with/without gen_experimental_profile - there was a small difference in that the summary stats were not output in the averaged profile. Testing: * Add an end-to-end test that generates output for a small profile log and compares against expected files. * Tweak other profile tests to reflect changes to output. Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d --- M be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M bin/jenkins/dockerized-impala-run-tests.sh M bin/rat_exclude_files.txt A testdata/impala-profiles/README A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.pretty.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.pretty_extended.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_default.expected A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_extended.expected M testdata/workloads/tpch/queries/runtime-profile-aggregated.test A tests/observability/test_profile_tool.py M tests/query_test/test_scanners.py 18 files changed, 95,018 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/16881/6 -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10211 (Part 1): Add support for role-related statements
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16837 ) Change subject: IMPALA-10211 (Part 1): Add support for role-related statements .. Patch Set 6: (1 comment) > Patch Set 6: Code-Review+1 > > (1 comment) > > The change looks good to me in general. > About the RANGER Jiras: I would prefer to create them before merging this and > mention them in comments or the commit message. Thanks Csaba! With regard to this, I have created RANGER-3125, RANGER-3126, and RANGER-3127 to keep track of the issues we have found. In addition, I also created IMPALA-10399 regarding the issue of the test test_show_grant_hive_privilege() we encountered. http://gerrit.cloudera.org:8080/#/c/16837/4/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/16837/4/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@1277 PS4, Line 1277: # Clean up the granted privileges and test roles. > > It would be great if we had some statement like "REVOKE ON http://gerrit.cloudera.org:8080/16837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 Gerrit-Change-Number: 16837 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 17 Dec 2020 00:37:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16881 to look at the new patch set (#5). Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. IMPALA-9865: part 2/2: add verbosity to profile tool Adds a --profile_verbosity option for impala-profile-tool with the following levels: * 0: minimal * 1: legacy - matches old output, this is the default still * 2: default - basic descriptive stats, used for V2 profile. * 3: extended * 4: full Use the profile version in impala-profile-tool to dump the more verbose output for the V2 profile while preserving the same output for the legacy profile. Reduce verbosity of v2 profile output - only include mean/min/max by default. Fix output with/without gen_experimental_profile - there was a small difference in that the summary stats were not output in the averaged profile. Testing: * Add an end-to-end test that generates output for a small profile log and compares against expected files. * Tweak other profile tests to reflect changes to output. Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d --- M be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M bin/jenkins/dockerized-impala-run-tests.sh M bin/rat_exclude_files.txt A testdata/impala-profiles/README A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.pretty.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.pretty_extended.json A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_default.expected A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_extended.expected M testdata/workloads/tpch/queries/runtime-profile-aggregated.test A tests/observability/test_profile_tool.py M tests/query_test/test_scanners.py 18 files changed, 95,018 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/16881/5 -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 36: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7872/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 36 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 17 Dec 2020 00:18:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 36: (1 comment) http://gerrit.cloudera.org:8080/#/c/16720/36/tests/run-tests.py File tests/run-tests.py: http://gerrit.cloudera.org:8080/#/c/16720/36/tests/run-tests.py@216 PS36, Line 216: # flake8: E265 block comment should start with '# ' -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 36 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 23:57:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#36). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire row group or page are skipped. An overlap predicate exists as a min/max filter. For the following query, the min and max in such a min/max filter are computed with the values from the join column from table 'b' and become fully available when the entire hash table is built. To evaluate the overlap predicate, these two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with the column type J (in hash table) and scan column type S will be formed when one of the following is true: Both J and S are booleans Both J and S are integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are Decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumRuntimeFilteredRowGroups 2. NumRuntimeFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 2. Added new tests in min_max_filters.test to demonstrate the number of filtered out pages and row groups with the two new profile counters; 2. Added new tests in runtime-filter-propagation.test to demonstrate that the overlap predicates work with different column types; 4. Added data type specific overlap method tests in min-max-filter-test.cc; 5. Core testing. TBD in this patch: 1. Performance measurement. To do in follow-up JIRAs: 1. Apply the overlap predicate on partition columns; 2. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test M tests/run-tests.py 29 files changed, 1,631 insertions(+), 241 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/36 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 36 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Patch Set 5: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6787/ -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 16 Dec 2020 22:47:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 22:21:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 15: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@20 PS15, Line 20: @SkipIfS3.variable_listing_times flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@53 PS15, Line 53: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@119 PS15, Line 119: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/15/tests/custom_cluster/test_topic_update_frequency.py@151 PS15, Line 151: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 22:00:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 3. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 697 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/15 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 15 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16882 ) Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 21:59:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16882 ) Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. IMPALA-10398: Altering an Iceberg table might throw NullPointerException IcebergSchemaConverter has a static thread local member which might not have a value in the current thread when nextId() is invoked. In that case the thread local integer's get() method returns a null and we get a NullPointerException when we want to convert it to a builtin int. This patch initializes the thread local variable with an anonymous subclass of ThreadLocal that overrides the 'initialValue()' method which returns 0 instead of null. Testing * tested manually by restarting the impala cluster and issuing ALTER TABLE .. ADD COLUMNS * looped test_alter_iceberg_tables for a while Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Reviewed-on: http://gerrit.cloudera.org:8080/16882 Reviewed-by: Gabor Kaszab Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java 1 file changed, 6 insertions(+), 1 deletion(-) Approvals: Gabor Kaszab: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 35: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7870/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 35 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 21:21:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 35: (1 comment) http://gerrit.cloudera.org:8080/#/c/16720/35/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/16720/35/be/src/exec/hdfs-scan-node-base.h@286 PS35, Line 286: /// implicitly holds the max value of the data item. line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 35 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 21:00:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#35). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire row group or page are skipped. An overlap predicate exists as a min/max filter. For the following query, the min and max in such a min/max filter are computed with the values from the join column from table 'b' and become fully available when the entire hash table is built. To evaluate the overlap predicate, these two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with the column type J (in hash table) and scan column type S will be formed when one of the following is true: Both J and S are booleans Both J and S are integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are Decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumRuntimeFilteredRowGroups 2. NumRuntimeFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 2. Added new tests in min_max_filters.test for join column type compatibility and to demonstrate the number of filtered out pages and row groups with the two new profile counters; 3. Added data type specific overlap method tests in min-max-filter-test.cc; 4. Core testing. TBD in this patch: 1. Performance measurement. To do in follow-up JIRAs: 1. Apply the overlap predicate on partition columns; 2. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 28 files changed, 1,628 insertions(+), 236 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/35 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 35 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 20:39:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10391: Fix LIRS edge case for single unprotected entry
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16857 ) Change subject: IMPALA-10391: Fix LIRS edge case for single unprotected entry .. IMPALA-10391: Fix LIRS edge case for single unprotected entry When an unprotected entry is not in the recency list, a lookup will cause it to be moved to be the newest entry in the unprotected list. The fix for IMPALA-10127 introduced a regression when this happens when there is exactly on entry in the unprotected list. The code currently calls RemoveFromUnprotectedList() followed by AddToUnprotectedList(). This now fails because it is doing these operations without manipulating the num_unprotected_ count. RemoveFromUnprotectedList() clears out unprotected_list_front_, because num_unprotected_ is 1. However, AddToUnprotectedList() does not set it back, because it only does that if num_unprotected_ is 0, and the count is not changing. This skips the remove/add in this case if there is exactly one unprotected entry in the list. Testing: - Added a backend test for this specific case and verfied that it failed before the fix and passes now Change-Id: I8d21b619811a1a7baab1a92790f2ffc03e949131 Reviewed-on: http://gerrit.cloudera.org:8080/16857 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M be/src/util/cache/lirs-cache-test.cc M be/src/util/cache/lirs-cache.cc 2 files changed, 44 insertions(+), 2 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8d21b619811a1a7baab1a92790f2ffc03e949131 Gerrit-Change-Number: 16857 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7869/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:57:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7868/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 16 Dec 2020 19:36:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@52 PS13, Line 52: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@118 PS13, Line 118: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/13/tests/custom_cluster/test_topic_update_frequency.py@150 PS13, Line 150: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:35:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 13: PS13 is a rebase and some minor comment updates. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 19:35:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 696 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/13 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16884 Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. As consequent, the returned string can grow large as well and cause a problem as previously happen in IMPALA-9851. With this patch, BufferedTupleStream::DebugString() only include maximum of 100 first pages of page list. Testing: - Add new be test SimpleNullStreamTest.ShortDebugString in buffered-tuple-stream-test.cc - Pass core tests Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 3 files changed, 71 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/1 -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 18:38:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: Testing: > Good idea. I will loop them overnight and report back. I also made some cha I looped them overnight for a 100 iterations without any failures. I plan to ignore this test for s3 builds in the next update. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 18:13:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 14: Code-Review+2 (3 comments) I've left some minor comments, but the change looks great overall! http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc File be/src/exprs/iceberg-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@75 PS14, Line 75: return TruncatePartitionTransformDecimalImpl(input.val4, width.val); TruncatePartitionTransformNumericImpl handles negative overflow, i.e. when input.val < 0 && result.val >= 0. I think we should do it here is as well. http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@163 PS14, Line 163: c_str nit: it doesn't really make a difference, but buffer.data() could be used instead which maybe expresses the intent better http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h@363 PS14, Line 363: buffer.append(_to_save, 1); nit: string also has push_back member function, so this could be just buffer.push_back(value_to_save); -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 17:34:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6787/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 16 Dec 2020 17:15:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Patch Set 5: verify failed due to IMPALA-10398 -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 16 Dec 2020 17:14:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7867/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 17:13:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7866/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 17:03:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 12: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/16825/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16825/11//COMMIT_MSG@35 PS11, Line 35: TODO: : * Current change includes some parts of IMPALA-10384 which needs to :be removed once https://gerrit.cloudera.org/#/c/16850/ is merged > https://gerrit.cloudera.org/#/c/16850/ is already merged Done http://gerrit.cloudera.org:8080/#/c/16825/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16825/12//COMMIT_MSG@12 PS12, Line 12: Identity-partitioned Iceberg tables are similar to regular : partitioned tables > Can you mention the difference in INSERT syntax? I mean that there is no PA Done http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc@157 PS11, Line 157: // For Iceberg tables we only have a single partition descriptor. : if (IsIceberg()) break; > This is the same as skipping the whole for loop, right? I would prefer to p Extracted the loop to a separate function. http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc@579 PS11, Line 579: it != partition_descriptor_map_.end() > It it possible for this to succeed if IsIceberg() is true? If not, then I t Done http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/runtime/descriptors.cc@250 PS11, Line 250: TIcebergPartitionSpec > Maybe const ref would be better to avoid copy. Thanks for catching this. Some programming in Java and Python and I surely forget these peculiarities. http://gerrit.cloudera.org:8080/#/c/16825/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/16825/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@4 PS11, Line 4: partitioend > typo Done http://gerrit.cloudera.org:8080/#/c/16825/12/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/16825/12/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@128 PS12, Line 128: select id, bool_col, int_col, bigint_col, float_col, double_col, > Can you add test here or in iceberg-negative.test where the insert is rejec Done -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 16:59:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16825 to look at the new patch set (#13). Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only This patch adds support to INSERT INTO identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to regular partitioned tables, they are even stored in the same directory structure. The difference is that the data files still store the partitioning columns. The INSERT INTO syntax is similar to the syntax for non-partitioned tables, i.e.: INSERT INTO VALUES (, , , ...); Or, INSERT INTO SELECT , , ... FROM (please note that we don't use the PARTITION keyword) The values must be in column order corresponding to the table schema. Impala will automatically create/find the partitions based on the Iceberg partition spec. Partitioned Iceberg tables are stored as non-partitioned tables in the Hive Metastore (similarly to partitioned Kudu tables). However, the InsertStmt still generates the partition expressions for them. These partition expressions are used to shuffle and sort the input data so we don't end up writing too many files. The HdfsTableSink also uses the partition expressions to write the data files with the proper partition paths. Iceberg is able to parse the partition paths to generate the corresponding metadata for the partitions. This happens at the end in IcebergCatalogOpExecutor. Testing: * added planner test to verify shuffling and sorting * added negative tests for unsupported features like PARTITION clause and non-identity partition transforms * e2e tests with partitioned inserts Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/dml-exec-state.cc M be/src/service/client-request-state.cc M common/fbs/IcebergObjects.fbs M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M tests/query_test/test_iceberg.py 27 files changed, 588 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/13 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc File be/src/exprs/iceberg-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc@144 PS13, Line 144: std::string buffer; > Maybe we can avoid dynamic memory allocations if we just used std::string a Done http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h@355 PS13, Line 355: . -1 > Maybe we could just return the vector. I wanted to avoid copy-ing the vector, but sure, I can return it as a return value as well. Done. http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357 PS12, Line 357: static std::string IntToByteBuff > I think we don't need the negation at L357 and the multiplication with -1 a Nice solution! I'll use this one. -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 16:46:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16741 to look at the new patch set (#14). Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions This patch implements Truncate and Bucket partition transforms in Impala BE as built-in functions. The expectation is that these functions give the same result as Iceberg's implementation of the same functions. These built-in functions are invisible so users won't be able to invoke them e.g. from impala-shell. Truncate: - Supported types are IntVal, BigIntVal, StringVal, DecimalVal. - Receives an input from the above types and a width. - Returns the same type as the input. - Expected behaviour is explained here: https://iceberg.apache.org/spec/#truncate-transform-details Bucket: - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal, TimestampVal. - Receives an input from the above types and the number of buckets as IntVal. - Returns IntVal. - Expected behaviour is explained here: https://iceberg.apache.org/spec/#bucket-transform-details Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd --- M be/src/codegen/impala-ir.cc M be/src/exprs/CMakeLists.txt A be/src/exprs/iceberg-functions-ir.cc A be/src/exprs/iceberg-functions-test.cc A be/src/exprs/iceberg-functions.h M be/src/exprs/scalar-expr-evaluator.cc A be/src/thirdparty/murmurhash/MurmurHash3.cpp A be/src/thirdparty/murmurhash/MurmurHash3.h A be/src/thirdparty/murmurhash/README.md M be/src/util/bit-util-test.cc M be/src/util/bit-util.h M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/function-registry/impala_functions.py 14 files changed, 1,438 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/14 -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10211 (Part 1): Add support for role-related statements
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16837 ) Change subject: IMPALA-10211 (Part 1): Add support for role-related statements .. Patch Set 6: Code-Review+1 (1 comment) The change looks good to me in general. About the RANGER Jiras: I would prefer to create them before merging this and mention them in comments or the commit message. http://gerrit.cloudera.org:8080/#/c/16837/4/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/16837/4/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@1277 PS4, Line 1277: # Clean up the granted privileges and test roles. > It would be great if we had some statement like "REVOKE ON > FROM ROLE IF EXISTS" in Impala. This could be done by calling these cleanup statements in the .py file and swallow the exception if there is an error. Calling revoke on a non-existing privilege doesn't have any side effect AFAIK. -- To view, visit http://gerrit.cloudera.org:8080/16837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 Gerrit-Change-Number: 16837 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 16 Dec 2020 16:37:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16882 ) Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6786/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 16:29:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16882 ) Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. Patch Set 1: Code-Review+2 Nice catch! -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 16:27:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 12: (7 comments) http://gerrit.cloudera.org:8080/#/c/16825/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16825/11//COMMIT_MSG@35 PS11, Line 35: TODO: : * Current change includes some parts of IMPALA-10384 which needs to :be removed once https://gerrit.cloudera.org/#/c/16850/ is merged https://gerrit.cloudera.org/#/c/16850/ is already merged http://gerrit.cloudera.org:8080/#/c/16825/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16825/12//COMMIT_MSG@12 PS12, Line 12: Identity-partitioned Iceberg tables are similar to regular : partitioned tables Can you mention the difference in INSERT syntax? I mean that there is no PARTITION(x=y ... ) in the statement, instead select list -> partition mapping occurs bases on column order if I understand correctly. http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc@157 PS11, Line 157: // For Iceberg tables we only have a single partition descriptor. : if (IsIceberg()) break; This is the same as skipping the whole for loop, right? I would prefer to put the for loop in an if block, or if we want to avoid additional nesting, the extract the loop to a separate function. http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/exec/hdfs-table-sink.cc@579 PS11, Line 579: it != partition_descriptor_map_.end() It it possible for this to succeed if IsIceberg() is true? If not, then I think it is clearer to handle the IsIceberg() case first. http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/16825/11/be/src/runtime/descriptors.cc@250 PS11, Line 250: TIcebergPartitionSpec Maybe const ref would be better to avoid copy. http://gerrit.cloudera.org:8080/#/c/16825/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/16825/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@4 PS11, Line 4: partitioend typo http://gerrit.cloudera.org:8080/#/c/16825/12/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/16825/12/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@128 PS12, Line 128: select id, bool_col, int_col, bigint_col, float_col, double_col, Can you add test here or in iceberg-negative.test where the insert is rejected because the types in the select list do not match the types in the column? E.g the columns here could be reordered. -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 15:26:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16882 ) Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7865/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 14:29:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10398: Altering an Iceberg table might throw NullPointerException
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16882 Change subject: IMPALA-10398: Altering an Iceberg table might throw NullPointerException .. IMPALA-10398: Altering an Iceberg table might throw NullPointerException IcebergSchemaConverter has a static thread local member which might not have a value in the current thread when nextId() is invoked. In that case the thread local integer's get() method returns a null and we get a NullPointerException when we want to convert it to a builtin int. This patch initializes the thread local variable with an anonymous subclass of ThreadLocal that overrides the 'initialValue()' method which returns 0 instead of null. Testing * tested manually by restarting the impala cluster and issuing ALTER TABLE .. ADD COLUMNS * looped test_alter_iceberg_tables for a while Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 --- M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java 1 file changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16882/1 -- To view, visit http://gerrit.cloudera.org:8080/16882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4e8b7c68898bd13c5288b466d5bf3d258392 Gerrit-Change-Number: 16882 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7864/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 13:23:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6785/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 16 Dec 2020 13:02:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16825 to look at the new patch set (#12). Change subject: IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only .. IMPALA-10380: INSERT INTO Iceberg tables with 'IDENTITY' partitions only This patch adds support to INSERT INTO identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to regular partitioned tables, they are even stored in the same directory structure. The difference is that the data files still store the partitioning columns. Partitioned Iceberg tables are stored as non-partitioned tables in the Hive Metastore (similarly to partitioned Kudu tables). However, the InsertStmt still generates the partition expressions for them. These partition expressions are used to shuffle and sort the input data so we don't end up writing too many files. The HdfsTableSink also uses the partition expressions to write the data files with the proper partition paths. Iceberg is able to parse the partition paths to generate the corresponding metadata for the partitions. This happens at the end in IcebergCatalogOpExecutor. Testing: * added planner test to verify shuffling and sorting * added negative tests for unsupported features like PARTITION clause and non-identity partition transforms * e2e tests with partitioned inserts TODO: * Current change includes some parts of IMPALA-10384 which needs to be removed once https://gerrit.cloudera.org/#/c/16850/ is merged Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/dml-exec-state.cc M be/src/service/client-request-state.cc M common/fbs/IcebergObjects.fbs M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M tests/query_test/test_iceberg.py 27 files changed, 558 insertions(+), 83 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/12 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc File be/src/exprs/iceberg-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc@144 PS13, Line 144: std::vector buffer; Maybe we can avoid dynamic memory allocations if we just used std::string and hope that small string optimization kicks in. -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 10:56:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 13: (2 comments) I had some comments about simplifying IntToByteArray, other than that the change lgtm. http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h@355 PS13, Line 355: void Maybe we could just return the vector. http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357 PS12, Line 357: T value = input; > This is needed because negative numbers are stored in two's complement form I think we don't need the negation at L357 and the multiplication with -1 at L361 because the right shift introduces ones on the left for negative numbers. I tested the following algorithm and it worked well: template vector IntToByteArray(T input) { vector ret; T value = input; for (int i = 0; i < sizeof(value); ++i) { // Applies a mask for a byte range on the input. char value_to_save = value & 0XFF; ret.push_back(value_to_save); value >>= 8; if (value == 0 && value_to_save >= 0) break; if (value == -1 && value_to_save < 0) break; } std::reverse(ret.begin(), ret.end()); return ret; } -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 16 Dec 2020 10:50:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9922: A better approach to deal with date's sub-second fractions
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16869 ) Change subject: IMPALA-9922: A better approach to deal with date's sub-second fractions .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16869/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16869/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-9922: A better approach to deal with date's sub-second fractions Additionally, I'd be more specific in the commit msg header instead of saying "better approach". Saying something like "Flexible fractional second length matching" would hold more details. -- To view, visit http://gerrit.cloudera.org:8080/16869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7 Gerrit-Change-Number: 16869 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 16 Dec 2020 10:19:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7863/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 10:01:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/16549/12/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/16549/12/fe/src/main/java/org/apache/impala/catalog/Table.java@86 PS12, Line 86: private final ReentrantReadWriteLock tableLock_ = new ReentrantReadWriteLock(true /*fair ordering*/); line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@52 PS12, Line 52: l flake8: E501 line too long (146 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@118 PS12, Line 118: f flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/16549/12/tests/custom_cluster/test_topic_update_frequency.py@150 PS12, Line 150: l flake8: E501 line too long (146 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 16 Dec 2020 09:39:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. Patch Set 12: (23 comments) http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@35 PS10, Line 35: > Did you loop any of these tests to make sure they're not flaky? Good idea. I will loop them overnight and report back. I also made some changes to the test to speed them up since they were taking very long. http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46 PS10, Line 46: c > nit: 3 Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212 PS10, Line 212: topicUpdateTblLockMaxWaitTimeM > nit: maxSkippedUpdatesLockContention_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214 PS10, Line 214: l long LOCK_RETRY_TIMEOUT_MS = > nit: topicUpdateTblLockMaxWaitTimeMs_ Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@437 PS10, Line 437:* held when the function returns. Returns false otherwise and no lock is held in > Comment needs a minor update to explain that it gets the write lock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@447 PS10, Line 447:*/ > Comment needs update to mention useWriteLock. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453 PS10, Line 453: em.c > It may be useful to know the lock type (read/write) in debugging. Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460 PS10, Line 460: want to unnecessari > Can we directly use the specified timeout here? I think the answer is no an yeah, this 0 timeout is to make sure that we don't also hold the version lock for the timeout duration. I added a comment. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395 PS10, Line 1395: when topic update thread inspe > I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts yeah, makes sense. Although it adds a bit more to the complexity. Added a comment to explain. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399 PS10, Line 1399:* @return t > nit: we can return true directly here. Then we don't need the lockAcquired Done http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416 PS10, Line 1416: // table lock was successfully acquired. We can now release the versionLock. : versionLock_.writeLock().unlock(); : return > Should we update tblVersion to be hdfsTable.getCatalogVersion() in this cas I don't think so. The tblVersion is used later down below to identify if we have skipped this table below before. http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215 PS10, Line 2215: Db db = getDb(updatedTbl.getDb().getName()); > If the existingTbl has a pendingVersion and both instances are > HdfsTable, we loss the pendingVersion in the updatedTbl. But looks > like it's ok because incrementAndGetCatalogVersion() will always > get a version larger than the pendingVersion. This is just a corner > case that only happens when a table loading is triggered due to > stale writeIdList. It's worth a comment here. Yes, like you mentioned this is okay since the incrementAndGetCatalogVersion will push the table out of topic update window. Also its worth noting that the the addTableToCatalogDelta() call in addDatabaseToCatalogDelta() is on a table object which is received from getAllTables(). getAllTables holds the versionLock's readlock. So the replaceTableIfUnchanged can either occur before the getAllTables() or after. In case of later we are okay since we don't lose the pendingVersion. In case of former, we are okay too since the table will be pushed out of topic-update window. > BTW, what if we use incrementAndGetCatalogVersion() at the end of > table modifications to set its catalog version? Is it a smipler > solution than the pendingVersion solution? We just need to avoid > deadlocks since incrementAndGetCatalogVersion() requires the > catalog versionLock but we are holding the table lock in table
[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: Skip locked tables from topic updates .. IMPALA-6671: Skip locked tables from topic updates This change adds a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval from the topic updates. This is especially useful in scenarios where long running operations on a locked table (refresh, recover partitions, compute stats) block the topic update thread. This causes unrelated queries which are waiting on metadata via topic updates (catalog-v1 mode) to unnecessarily block. The ideal solution of this problem would be to make HdfsTable immutable so that there is no need for table lock. But that is large change and not easily portable to older releases of Impala. It would be taken up as a separate patch. This change introduces 2 new configurations for catalogd: 1. topic_update_tbl_max_wait_time_ms: This defines the maximum time in msecs the topic update thread waits on a locked table before skipping the table from that iteration of topic updates. The default value is 500. If this configuration is set to 0 the lock with timeout for topic update thread is disabled. 2. catalog_max_lock_skipped_topic_updates: This defines the maximum number of distinct lock operations which are skipped by topic update thread due to lock contention. Once this limit is reached, topic update thread will block until it acquires the table lock and adds it to the updates. Testing: 1. Added a test case which introduces a simulated delay in a few potentially long running statements. This causes the table to be locked for a long time. The topic update thread skips that table from updates and unrelated queries are unblocked since they receive the required metadata from updates. 2. Added a test where multiple threads run blocking statements in a loop to stress the table lock. It makes sure that topic update thread is not starved and eventually blocks on table lock by hitting the limit defined by catalog_max_lock_skipped_topic_updates. 2. Ran exhaustive tests with default configurations. Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A tests/custom_cluster/test_topic_update_frequency.py 13 files changed, 687 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/16549/12 -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11 PS3, Line 11: exceed nit: 'exceeding' instead of exceed http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18 PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) from For this example, can you also add the query option ? I assume it was set to 10M ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node Pls indicate what the key of the map is. I am also wondering whether the string (which is the node name) is the only way to identify the node. Is there no numeric id that you can use ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231 PS3, Line 231: per_num_join_rows_produced This could more simply be named 'per_join_rows_produced' (see similar comment elsewhere on the option name) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331 PS3, Line 331: // stats join node nit: this comment is a bit confusing. Suggest rephrasing to 'row count stats for a join node' http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335 PS3, Line 335: rfind(nested_loop_type A bit odd that for hash_type you are passing a second argument of 0 while not doing the same for nested_loop_type ? If there's a reason for that it would be good to add a comment. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336 PS3, Line 336: per_num_join_rows_produced[node->name()] = rows_counter->value(); Is there not a unique numeric id that can be used here instead of the node name ? (related comment in coordinator.h) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688 PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED, it would be useful to know which join node (with the node id) in a complex plan exceeded this limit. Is it possible to print that info in the error message ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991 PS3, Line 991: case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: { I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the number of rows. Similar simplification of the variable names can be done in other parts of the code. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996 PS3, Line 996: return Status(Substitute("Invalid rows returned limit: '$0'. " 'rows returned limit' doesn't match the name of the query option. Should this be 'Invalid join rows produced limit'? http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test: http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71 PS3, Line 71: QUERY Suggest adding a test with a mix of joins ..i.e couple of hash joins and a nested loop join where one or more of these might exceed the threshold. http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87 PS3, Line 87: in nit: remove trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 + Gerrit-HasComments: Yes