[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16182 ) Change subject: IMPALA-9974: Join elimination based on referential integrity. .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6523/ -- To view, visit http://gerrit.cloudera.org:8080/16182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074 Gerrit-Change-Number: 16182 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 07 Oct 2020 06:09:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16182 ) Change subject: IMPALA-9974: Join elimination based on referential integrity. .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7375/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074 Gerrit-Change-Number: 16182 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 07 Oct 2020 04:22:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16182 ) Change subject: IMPALA-9974: Join elimination based on referential integrity. .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@292 PS4, Line 292: EqJoinConjunctScanSlots slots = EqJoinConjunctScanSlots.create(eqJoinConjunct, analyzer); line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/main/java/org/apache/impala/planner/JoinNode.java@478 PS4, Line 478: return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, rhsScanSlot, analyzer); line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@523 PS4, Line 523: FeFsTable p2 = (FeFsTable) catalog_.getOrLoadTable("functional", "parent_table_2", "test", null); line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/16182/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@532 PS4, Line 532: FeFsTable child = (FeFsTable) catalog_.getOrLoadTable("functional", "child_table", "test", null); line too long (101 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074 Gerrit-Change-Number: 16182 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 07 Oct 2020 04:02:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9974: Join elimination based on referential integrity.
Shant Hovsepian has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16182 Change subject: IMPALA-9974: Join elimination based on referential integrity. .. IMPALA-9974: Join elimination based on referential integrity. Use defined referential dependency constraints during planning. 1. For join ndv calculations involving a single column primary key we now use the table's row count instead of the column's ndv stats which are approximate and incorrect at times. 2. Convert LEFT SEMI JOIN to INNER JOIN if the inner table's primary key is used as the join key. 3. Convert LEFT OUTER JOIN to INNER JOIN if the join key uses a fk/pk relationship, the FK is not nullable. 4. Prune INNER JOINS on fk/pk conditions where the join doesn't affect the output cardinality and there are no predicates or non-key columns in use on the inner side. 5. For a confirmed fk/pk join we include a IS NOT NULL predicate on the foreign key columns. Plan optimizations can be disabled with the query option DISABLE_INFORMATIONAL_CONSTRAINTS Testing: - New constraint rewrite planner test - TPC-DS functional tests - TPC-DS Planner test changes Some TPC-DS 30TB Performance Improvements: - Q72 74.3s -> 24.8s - Q82 13.5s -> 2.2s - Q84 19.33s -> 6.07s - Q85 29s -> 6.33s Change-Id: Idd07d45e419175af7d62f6035ba51be841ebc074 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java A fe/src/main/java/org/apache/impala/analysis/TableConstraintRewriter.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/SqlConstraints.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/bin/compute-table-stats.sh M testdata/data/child_table.txt M testdata/data/parent_table.txt M testdata/data/parent_table_2.txt M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/constraint-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test M testdata/workloads/functi
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 07 Oct 2020 00:34:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. IMPALA-10189: don't reload partitions for drop stats This patch avoids reloading partition metadata from the HMS or filesystems when partition stats are changed, by a DROP STATS command. bulkAlterPartitions() is modified to allow callers to control whether partitions are marked dirty (and therefore reloaded). DDLs that do not mark partitions dirty need to correctly update the in-memory state of the Partition object so that it will be the same as it would be after a reload from the HMS. In this case, numRows_ was not previously updated, but now needs to be updated. Testing: * Ran exhaustive tests * Manually tested some scenarios with modifications to partitions do in Hive outside of Impala. The previous behaviour is preserved (i.e. cached partition state in Impala overwrites the Hive modifictions). * Manually tested dropping and computing regular and incremental stats and inspected the results in both Impala and Hive. We already have automated test coverage for these scenarios as well. Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Reviewed-on: http://gerrit.cloudera.org:8080/16516 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 79 insertions(+), 37 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16548 ) Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10 PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS > I meant "...in the catalog service" A pointer for debug actions - this is where they are actually implemented: https://github.com/apache/impala/blob/master/be/src/util/debug-util.cc#L354 SLEEP is good for what we want here - the others could be implemented later when they are actually needed. http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24 PS2, Line 24: CustomClusterTestSuite > This change was done to simulate certain conditions for development of patc It doesn't have to be a custom cluster test for debug actions, as debug_actions is a query option. The test should be marked with @pytest.mark.execute_serially though to avoid interfering with parallel tests by locking catalog for a long time. It can stay as a custom_cluster test if it makes merging with https://gerrit.cloudera.org/#/c/16549/ easier, and we can speed up the test later by making it non custom cluster if it turns out to be possible. -- To view, visit http://gerrit.cloudera.org:8080/16548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1 Gerrit-Change-Number: 16548 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 23:23:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 5: Code-Review+2 Thanks for the changes! -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 23:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16538 ) Change subject: IMPALA-10188: Remove unused WebDAV functions .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53 Gerrit-Change-Number: 16538 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 06 Oct 2020 21:47:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16538 ) Change subject: IMPALA-10188: Remove unused WebDAV functions .. IMPALA-10188: Remove unused WebDAV functions Included the latest squeasel/master from upstream: - fadbf61f9e5dfd08b2d6aad785bb521821c870c8: The function "sq_get_bound_addresses" returns the full IPv6 addresses. - 764cd7dab35e5dabc7bfa3a6d4d5b187ace6b913: Removed PROPFIND and MKCOL methods from squeasel. Testing: - Ran webserver-test - Validated that the impalad debug UI came up fine and made sure that all the existing links work as expected. Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53 Reviewed-on: http://gerrit.cloudera.org:8080/16538 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/thirdparty/squeasel/squeasel.c M be/src/util/webserver-test.cc 2 files changed, 6 insertions(+), 111 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53 Gerrit-Change-Number: 16538 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7374/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 21:42:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16506 ) Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1 Gerrit-Change-Number: 16506 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 06 Oct 2020 21:25:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16506 ) Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist .. IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist In current ExecuterBlacklist class, we've keyed maps on a TNetworkAddress of a backend. To simply the logic for the class, changes it to key off of the UniqueIdPB backend-id, eg. refactor 'executor_list_' to no longer be a map list> and instead makes it a map. Also fixes a minor bug with the calculation of elapsed time when a backend that was on probation is re-blacklisted. Testing: - Passed test_blacklist.py and test_query_retries.py. - Passed exhaustive tests. Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1 Reviewed-on: http://gerrit.cloudera.org:8080/16506 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/scheduling/executor-blacklist.cc M be/src/scheduling/executor-blacklist.h 2 files changed, 69 insertions(+), 103 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1 Gerrit-Change-Number: 16506 Gerrit-PatchSet: 10 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Fang-Yu Rao has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking We found that Ranger would generate an AuthzAuditEvent as long as there exists a column masking policy corresponding to the column even though the policy does not apply to the requesting user. This resulted in an IllegalStateException if a user "A" submits a SELECT query against a table that has a column specified in a column masking policy when the policy does not apply to "A", i.e., the field of 'Select User' for this policy in the Ranger web UI does not contain "A". For such an AuthzAuditEvent, its field of 'accessType' will not be one of the supported mask types since its corresponding accessResult.isMaskEnabled() would evaluates to false, indicating that there is no matching column masking policy associated with the user "A" and thus the AuthzAuditEvent will not be post-processed by Impala in RangerAuthorizationCheker#createColumnMask(). But since we did not filter out such an AuthzAuditEvent when it was generated and returned from RangerBasePlugin#evalDataMaskPolicies(), we failed the check that requires every AuthzAuditEvent be column masking-related in RangerAuthorizationContext#stashAuditEvents(). To address this issue, in this patch we filter out such an AuthzAuditEvent after each call to RangerBasePlugin#evalDataMaskPolicies() so that no redundant AuthzAuditEvent is generated. Testing: - Added a new column masking policy associated with a non-matching user in RangerAuditLogTest#testAuditsForColumnMasking() to verify that the redundant AuthzAuditEvent is removed. - Verified that the patch passes the exhaustive tests in the DEBUG build. Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b --- M fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java 2 files changed, 35 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/16524/5 -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16534 ) Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7373/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 21:15:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7372/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 21:03:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16534 ) Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash .. IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash Current code generates data path hash in MD5 for Iceberg Table. But MD5 is one of forbidden algorithms for FIPS. Even for non-security purposes, like hash map, we still cannot use MD5 algorithm. This patch replaces MD5 with non-cryptographic hash function murmur3_128, which generates hash value with same length as MD5. Testing: - Passed core tests. Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java 6 files changed, 22 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/16534/4 -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1647 PS8, Line 1647: // Remove the ROW_COUNT parameter if it has been set and set numRows to reflect change "set and set" to "set and update". Maybe better to move the numRows comment down further so it's more clear what code that belongs with. http://gerrit.cloudera.org:8080/#/c/16516/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4765 PS8, Line 4765: FileChecksum checkSum = fs.getFileChecksum(filePath); getFileSystem was changed to return null and is being used unchecked here. -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 20:52:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16547 ) Change subject: IMPALA-9664: Fix typo in test_event_processing.py .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3 Gerrit-Change-Number: 16547 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 06 Oct 2020 20:52:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16547 ) Change subject: IMPALA-9664: Fix typo in test_event_processing.py .. IMPALA-9664: Fix typo in test_event_processing.py The test tries to add src_db within a database object which would fail when the database managed location is present. This test doesn't fail currently in ASF master since the database doesn't have managed location yet. It will start failing once the managed location for databases is available in the toolchain build of Hive. Testing: 1. The test was working before the patch since the managed db location was probably not set and modified line was not getting executed. I made sure the test works with the patch as well. 2. I applied the patch in an environment where managed db location is available and the error disappears. (Although the test fails for another unrelated reason HIVE-23995) so we should be aware of this when the toolchain hive build is bumped up. Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3 Reviewed-on: http://gerrit.cloudera.org:8080/16547 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_event_processing.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3 Gerrit-Change-Number: 16547 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG@9 PS1, Line 9: We found that Ranger would generate an AuthzAuditEvent as long as : there exists a column masking policy corresponding to the column : even though the policy does not apply to the requesting user > Thanks Csaba! I have consulted Abhay Kulkarni on the Ranger project about t Thanks for investigating this! If this is an expected behavior than I agree that it is the best to just remove the extra event as the patch does. http://gerrit.cloudera.org:8080/#/c/16524/1//COMMIT_MSG@30 PS1, Line 30: Furthermore, we also revise all the checks : for the generated AuthzAuditEvent's due to the evaluation of column : masking policies so that a failed check would also result in an entry in : the error log. > Thanks Csaba for the feedback! I will add back the removed checks according I see that you kept both the checks and the logs - do you think that the logs are necessary? The failed preconditions are always added to the error log AFAIK. -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 20:51:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16534 ) Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash .. Patch Set 3: Right, we have to construct the absolute paths in loadFileDescFromThrift(). It causes memory overhead on the coordinator and coordinator is more memory intensive. Will keep Murmur hash. -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 20:50:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 3: > Patch Set 3: > > Build Failed > > https://jenkins.impala.io/job/gerrit-code-review-checks/7371/ : Initial code > review checks failed. See linked job for details on the failure. The error message provided in the following indicates this failure does not seem to be related to my patch. So I re-triggered the build again. 19:48:12 [ERROR] Failed to execute goal on project impala-minimal-s3a-aws-sdk: Could not resolve dependencies for project org.apache.impala:impala-minimal-s3a-aws-sdk:jar:0.1-SNAPSHOT: Failed to collect dependencies at org.apache.hadoop:hadoop-aws:jar:3.1.1.7.2.1.0-287: Failed to read artifact descriptor for org.apache.hadoop:hadoop-aws:jar:3.1.1.7.2.1.0-287: Could not transfer artifact org.apache.hadoop:hadoop-main:pom:3.1.1.7.2.1.0-287 from/to impala.cdp.repo (https://native-toolchain.s3.amazonaws.com/build/cdp_components/4493826/maven): Connect to native-toolchain.s3.amazonaws.com:443 [native-toolchain.s3.amazonaws.com/52.219.112.106] failed: Connection timed out (Connection timed out) -> [Help 1] -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 20:44:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16385 ) Change subject: IMPALA-9792: Add ability to split kudu scan ranges .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6518/ -- To view, visit http://gerrit.cloudera.org:8080/16385 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8 Gerrit-Change-Number: 16385 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 06 Oct 2020 20:44:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16548 ) Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10 PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS > Thanks for this suggestion. I was not aware of this and will look into this I meant "...in the catalog service" -- To view, visit http://gerrit.cloudera.org:8080/16548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1 Gerrit-Change-Number: 16548 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 20:39:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16548 ) Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10 PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS > Couldn't we use DEBUG_ACTION in the frontend similarly to delay/failure inj Thanks for this suggestion. I was not aware of this and will look into this. Definitely looks like a good match which can be enhanced to other RPC delays (HMS, Ranger etc) and the catalog service. http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc@89 PS2, Line 89: DECLARE_int64(catalog_simulate_hdfs_listing_delay_ms); > What is the relation of the query option and the flag? I ran through the co yes, originally, I had created this as catalog service configuration which was being picked in the code during initialization of CatalogServiceCatalog. I think this can be removed. I will update the patch. http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24 PS2, Line 24: CustomClusterTestSuite > Does this has to be a custom cluster test? I agree that this should be a se This change was done to simulate certain conditions for development of patch for IMPALA-6671 (https://gerrit.cloudera.org/#/c/16549/) The test which uses this option in that patch is a custom cluster test. It looks like with debug actions though, we may have to keep this as a custom cluster test, right? -- To view, visit http://gerrit.cloudera.org:8080/16548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1 Gerrit-Change-Number: 16548 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 20:38:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7371/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 06 Oct 2020 19:50:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6671: [WIP] Skip locked tables from topic updates
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16549 ) Change subject: IMPALA-6671: [WIP] Skip locked tables from topic updates .. Patch Set 1: (3 comments) I think if this approach works out it would meaningfully improve metadata for production workloads - seems like it should reduce the number of hiccups that non-long-running-ddl queries experience, although obviously with the approach they can eventually get stuck behind a DDL. We'd expect that potentially, this could have some impact on tables where there are multiple long-running DDLs, right? E.g. that table might skip topic updates and delay queries waiting for that particular table. I think that seems fine, but just wanted to be sure. http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@9 PS1, Line 9: This change adds a mechanism for topic-update thread question, that maybe will be answered as I read the patch - if an update is skipped, does the topic update collection thread get woken up when the lock is released? http://gerrit.cloudera.org:8080/#/c/16549/1//COMMIT_MSG@23 PS1, Line 23: This solution unblocks topic-updates for a while until a Maybe this is the best we can do at the moment, but it does seem to have similar problems to the IMPALA-5058 solution where it does help with latency when it's blocked for shorter periods of time but still gets blocked when a lock is held for a long time. http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16549/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392 PS1, Line 1392: if (tblVersion > ctx.toVersion && I've seen some cases in production workloads where the topic update thread gets stuck acquiring the table lock, but then has to skip the update anywhere here. Is there any way we can move this check outside of the lock acquisition so that it can exit early in that case? -- To view, visit http://gerrit.cloudera.org:8080/16549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic657b96edbcdc94c6b906e7ca59291f4e4715655 Gerrit-Change-Number: 16549 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:50:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10219: Add query option to simulate HDFS listing delay
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16548 ) Change subject: IMPALA-10219: Add query option to simulate HDFS listing delay .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16548/2//COMMIT_MSG@10 PS2, Line 10: SIMULATE_CATALOGD_HDFS_LISTING_DELAY_MS Couldn't we use DEBUG_ACTION in the frontend similarly to delay/failure injection to the backend? E.g. CATALOGD_HDFS_LISTING:SLEEP@1000 This may be more work now, but similar delays could be implemented with much less plumbing in the future. I am ok with the current solution if we need to merge this quickly, but I think it worth a TODO, as in the long term DEBUG_ACTION seems better to me. http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/16548/2/be/src/util/backend-gflag-util.cc@89 PS2, Line 89: DECLARE_int64(catalog_simulate_hdfs_listing_delay_ms); What is the relation of the query option and the flag? I ran through the code and tests and didn't see any usage of the flag. The commit message also doesn't mention it. http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py File tests/custom_cluster/test_topic_update_frequency.py: http://gerrit.cloudera.org:8080/#/c/16548/2/tests/custom_cluster/test_topic_update_frequency.py@24 PS2, Line 24: CustomClusterTestSuite Does this has to be a custom cluster test? I agree that this should be a serialized test, but I don't see why would we need to restart the cluster. -- To view, visit http://gerrit.cloudera.org:8080/16548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7196b1ce76415a5faf3fa8575a26d22b2bf50b1 Gerrit-Change-Number: 16548 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 06 Oct 2020 19:37:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7370/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:27:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking
Fang-Yu Rao has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16524 ) Change subject: IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking .. IMPALA-10192: Filter out redundant AuthzAuditEvent's for column masking We found that Ranger would generate an AuthzAuditEvent as long as there exists a column masking policy corresponding to the column even though the policy does not apply to the requesting user. This resulted in an IllegalStateException if a user "A" submits a SELECT query against a table that has a column specified in a column masking policy when the policy does not apply to "A", i.e., the field of 'Select User' for this policy in the Ranger web UI does not contain "A". For such an AuthzAuditEvent, its field of 'accessType' will not be one of the supported mask types since its corresponding accessResult.isMaskEnabled() would evaluates to false, indicating that there is no matching column masking policy associated with the user "A" and thus the AuthzAuditEvent will not be post-processed by Impala in RangerAuthorizationCheker#createColumnMask(). But since we did not filter out such an AuthzAuditEvent when it was generated and returned from RangerBasePlugin#evalDataMaskPolicies(), we failed the check that requires every AuthzAuditEvent be column masking-related in RangerAuthorizationContext#stashAuditEvents(). To address this issue, in this patch we filter out such an AuthzAuditEvent after each call to RangerBasePlugin#evalDataMaskPolicies() so that no redundant AuthzAuditEvent is generated. Furthermore, we also revise all the checks for the generated AuthzAuditEvent's due to the evaluation of column masking policies so that a failed check would also result in an entry in the error log. Testing: - Added a new column masking policy associated with a non-matching user in RangerAuditLogTest#testAuditsForColumnMasking() to verify that the redundant AuthzAuditEvent is removed. - Verified that the patch passes the exhaustive tests in the DEBUG build. Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b --- M fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java M fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java 3 files changed, 53 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/16524/3 -- To view, visit http://gerrit.cloudera.org:8080/16524 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dbf65874003523b5176680e42f26fa2114c229b Gerrit-Change-Number: 16524 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6522/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 7: Code-Review+2 carry -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Hello Quanlong Huang, Kurt Deschler, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16516 to look at the new patch set (#6). Change subject: IMPALA-10189: don't reload partitions for drop stats .. IMPALA-10189: don't reload partitions for drop stats This patch avoids reloading partition metadata from the HMS or filesystems when partition stats are changed, by a DROP STATS command. bulkAlterPartitions() is modified to allow callers to control whether partitions are marked dirty (and therefore reloaded). DDLs that do not mark partitions dirty need to correctly update the in-memory state of the Partition object so that it will be the same as it would be after a reload from the HMS. In this case, numRows_ was not previously updated, but now needs to be updated. Testing: * Ran exhaustive tests * Manually tested some scenarios with modifications to partitions do in Hive outside of Impala. The previous behaviour is preserved (i.e. cached partition state in Impala overwrites the Hive modifictions). * Manually tested dropping and computing regular and incremental stats and inspected the results in both Impala and Hive. We already have automated test coverage for these scenarios as well. Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d --- M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 79 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/16516/6 -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10189: don't reload partitions for drop stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16516 ) Change subject: IMPALA-10189: don't reload partitions for drop stats .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@199 PS5, Line 199: import com.codahale.metrics.Timer; > nit, unnecessary change by IDE? May be move them back to original place. Done http://gerrit.cloudera.org:8080/#/c/16516/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4115 PS5, Line 4115: .. > nit, can remove . Done -- To view, visit http://gerrit.cloudera.org:8080/16516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91009d6e9fdf25b3a3341fd1d29219519eb39a3d Gerrit-Change-Number: 16516 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 19:08:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 18:44:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10210: Skip Authentication for connection from a trusted domain
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16542 ) Change subject: IMPALA-10210: Skip Authentication for connection from a trusted domain .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/rpc/authentication-util.cc File be/src/rpc/authentication-util.cc: http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/rpc/authentication-util.cc@181 PS1, Line 181: s = sock_addr.LookupHostname(&host_name); Is there any performance problems with doing a reverse dns lookup on every request? http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp@195 PS1, Line 195: transport_->getOrigin() > for connections originating from the localhost, this returns "localhost" wh Does this ever return an IP address? From looking at the code for TSocket, it seems like this will always return a hostname, in which case of course we should probably handle all hostnames, not just localhost. http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/transport/THttpServer.cpp@206 PS1, Line 206: // Try authenticating with cookies first. This is outdated now http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/util/webserver.cc File be/src/util/webserver.cc: http://gerrit.cloudera.org:8080/#/c/16542/1/be/src/util/webserver.cc@599 PS1, Line 599: // Try authenticating with a cookie first, if enabled. outdated -- To view, visit http://gerrit.cloudera.org:8080/16542 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I09234078e2314dbc3177d0e869ae028e216ca699 Gerrit-Change-Number: 16542 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 06 Oct 2020 17:59:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10198: Unify Java code under the $IMPALA HOME/java directory
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16500 ) Change subject: IMPALA-10198: Unify Java code under the $IMPALA_HOME/java directory .. Patch Set 3: (1 comment) Missed this review comment somehow. I included random other build fixes in this upload, and I'm going to split it up into a couple changes to make the pieces clear. I'll let you know when this is ready to review. http://gerrit.cloudera.org:8080/#/c/16500/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16500/3//COMMIT_MSG@18 PS3, Line 18: This moves all the Java projects other than fe into > We'll probably have to update some of the wiki pages to reflect this right, The nice thing about this structure is that the maven commands all work on the submodules. Since fe didn't move, the same commands still work from the same directory. I will double check that to be sure. -- To view, visit http://gerrit.cloudera.org:8080/16500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a Gerrit-Change-Number: 16500 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 06 Oct 2020 17:49:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 16:58:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating Iceberg table, we should use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.file_format'='parquet', 'iceberg.catalog'='hadoop.tables') Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Reviewed-on: http://gerrit.cloudera.org:8080/16550 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 6 files changed, 22 insertions(+), 22 deletions(-) Approvals: Zoltan Borok-Nagy: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16538 ) Change subject: IMPALA-10188: Remove unused WebDAV functions .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53 Gerrit-Change-Number: 16538 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 06 Oct 2020 16:18:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10188: Remove unused WebDAV functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16538 ) Change subject: IMPALA-10188: Remove unused WebDAV functions .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6521/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b16116e86b6d39112eace0f3a783aa7f2319a53 Gerrit-Change-Number: 16538 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 06 Oct 2020 16:19:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16506 ) Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6520/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1 Gerrit-Change-Number: 16506 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 06 Oct 2020 16:10:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9180 (part 2): Refactor executor list map of ExecuterBlacklist
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16506 ) Change subject: IMPALA-9180 (part 2): Refactor executor_list_ map of ExecuterBlacklist .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1ae082d0e080088756af91b5b770752ca8b3aa1 Gerrit-Change-Number: 16506 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 06 Oct 2020 16:10:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16543 ) Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7369/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a Gerrit-Change-Number: 16543 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 06 Oct 2020 15:50:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository
Joe McDonnell has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16543 ) Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository .. IMPALA-10218: Remove impala.cdh.repo Maven repository This removes the impala.cdh.repo Maven repository (i.e. the repository for the CDH_BUILD_NUMBER). It removes the associated code for CDH_BUILD_NUMBER. The only remaining dependency for the CDH_BUILD_NUMBER repository was Apache Kite in some of our test code. This transitions that code to use the public version of Apache Kite. The testdata/TableFlattener Java project is intended to be used manually and is not used for any tests. It has bitrotted and currently does not build. I verified that it now builds (which it currently did not), but I did not verify functionality. Testing: - Ran a core job - Built testdata/TableFlattener Java project Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a --- M bin/impala-config.sh M buildall.sh M impala-parent/pom.xml M testdata/TableFlattener/pom.xml M testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/Main.java M testdata/pom.xml 6 files changed, 55 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16543/3 -- To view, visit http://gerrit.cloudera.org:8080/16543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a Gerrit-Change-Number: 16543 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16543 ) Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository .. Patch Set 3: Thanks for the review! -- To view, visit http://gerrit.cloudera.org:8080/16543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a Gerrit-Change-Number: 16543 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 06 Oct 2020 15:30:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10218: Remove impala.cdh.repo Maven repository
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16543 ) Change subject: IMPALA-10218: Remove impala.cdh.repo Maven repository .. IMPALA-10218: Remove impala.cdh.repo Maven repository This removes the impala.cdh.repo Maven repository (i.e. the repository for the CDH_BUILD_NUMBER). It removes the associated code for CDH_BUILD_NUMBER. The only remaining dependency for the CDH_BUILD_NUMBER repository was Apache Kite in some of our test code. This transitions that code to use the public version of Apache Kite. The testdata/TableFlattener Java project is intended to be used manually and is not used for any tests. It has bitrotted and currently does not build. I verified that it now builds (which it currently did not), but I did not verify functionality. Testing: - Ran a core job - Built testdata/TableFlattener Java project Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a Reviewed-on: http://gerrit.cloudera.org:8080/16543 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer --- M bin/impala-config.sh M buildall.sh M impala-parent/pom.xml M testdata/TableFlattener/pom.xml M testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/Main.java M testdata/pom.xml 6 files changed, 55 insertions(+), 116 deletions(-) Approvals: Impala Public Jenkins: Verified Csaba Ringhofer: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a Gerrit-Change-Number: 16543 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16547 ) Change subject: IMPALA-9664: Fix typo in test_event_processing.py .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3 Gerrit-Change-Number: 16547 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 06 Oct 2020 15:26:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9664: Fix typo in test event processing.py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16547 ) Change subject: IMPALA-9664: Fix typo in test_event_processing.py .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6519/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34e16f52722ada2334aeb3bbb187c6c6358d20a3 Gerrit-Change-Number: 16547 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 06 Oct 2020 15:26:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16385 ) Change subject: IMPALA-9792: Add ability to split kudu scan ranges .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16385 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8 Gerrit-Change-Number: 16385 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 06 Oct 2020 15:24:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9792: Add ability to split kudu scan ranges
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16385 ) Change subject: IMPALA-9792: Add ability to split kudu scan ranges .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6518/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16385 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8 Gerrit-Change-Number: 16385 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 06 Oct 2020 15:24:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7368/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 15:21:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@824 PS2, Line 824: if (!ComputeCandidatePages(page_locations, candidate_ranges_, row_group.num_rows, > this is not needed, as we set it to false in ResetPageFiltering() anyway Done http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc File be/src/exec/parquet/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172 PS1, Line 172: while (next_page_idx < page_locations.size() && :page_locations[next_page_idx].compressed_page_size == 0) { : ++next_page_idx; : } > I don't think that copying a vector with the size of the number of pages wo Yeah, for now I'd rather leave it that way. -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 15:03:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16503 to look at the new patch set (#3). Change subject: IMPALA-9952: Fix page index filtering for empty pages .. IMPALA-9952: Fix page index filtering for empty pages As IMPALA-4371 and IMPALA-10186 points out, Impala might write empty data pages. It usually does that when it has to write a bigger page than the current page size. If we really need to write empty data pages is a different question, but we need to handle them correctly as there are already such files out there. The corresponding Parquet offset index entries to empty data pages are invalid PageLocation objects with 'compressed_page_size' = 0. Before this commit Impala didn't ignore the empty page locations, but generated a warning. Since invalid page index doesn't fail a scan by default, Impala continued scanning the file with semi-initialized page filtering. This resulted in 'Top level rows aren't in sync' error, or a crash in DEBUG builds. With this commit Impala ignores empty data pages and still able to filter the rest of the pages. Also, if the page index is corrupt for some other reason, Impala correctly resets the page filtering logic and falls back to regular scanning. Testing: * Added unit test for empty data pages * Added e2e test for empty data pages * Added e2e test for invalid page index Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common-test.cc M be/src/exec/parquet/parquet-common.cc M testdata/data/README A testdata/data/alltypes_empty_pages.parquet A testdata/data/alltypes_invalid_pages.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test M tests/query_test/test_parquet_stats.py 9 files changed, 195 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16503/3 -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16552 ) Change subject: IMPALA-10220: Fix negative value bug in RpcNetworkTime counter. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7367/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69 Gerrit-Change-Number: 16552 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 06 Oct 2020 14:50:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 2: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16503/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@824 PS2, Line 824: filter_pages_ = false; this is not needed, as we set it to false in ResetPageFiltering() anyway http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc File be/src/exec/parquet/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172 PS1, Line 172: int next_page_idx = page_idx + 1; : while (next_page_idx < page_locations.size() && :page_locations[next_page_idx].compressed_page_size == 0) { : > Yeah it'd also work, but I didn't want to dynamically allocate a new vector I don't think that copying a vector with the size of the number of pages would matter, but I am ok with the current solution if you prefer it. -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 14:46:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10220: Fix negative value bug in RpcNetworkTime counter.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16552 Change subject: IMPALA-10220: Fix negative value bug in RpcNetworkTime counter. .. IMPALA-10220: Fix negative value bug in RpcNetworkTime counter. Total RPC time was incorrectly computed using resp_.receiver_latency_ns() in function EndDataStreamCompleteCb(). This patch fix the bug by replacing it with eos_rsp_.receiver_latency_ns(). This patch also fix logging mistakes in LogSlowRpc() to use its 'resp' parameter instead of 'resp_' field member. Testing: - Manually run data loading query that exhibit the bug for several times and verify that the Min value of RpcNetworkTime counter is always positive after the patch. The query used in testing is insert query to TPC-DS fact table store_sales of scale 10GB in single machine mini cluster. - Add DCHECK to verify that total rpc time is greater than or equal to receiver_latency_ns. - Run and pass core tests. Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69 --- M be/src/runtime/krpc-data-stream-sender.cc 1 file changed, 8 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16552/1 -- To view, visit http://gerrit.cloudera.org:8080/16552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2a4d65a3e0f88349bd4ee1b01290bd2c386acc69 Gerrit-Change-Number: 16552 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7366/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 13:59:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16503 ) Change subject: IMPALA-9952: Fix page index filtering for empty pages .. Patch Set 2: (5 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/16503/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16503/1//COMMIT_MSG@11 PS1, Line 11: data > typo: data Done http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@637 PS1, Line 637: if (filter_pages_ && candidate_ranges_.empty()) { > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@724 PS1, Line 724: ETURN_IF_ERR > Maybe make filter_pages a member, and reset it in ResetPageFiltering? Done http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc File be/src/exec/parquet/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@142 PS1, Line 142: auto& last_valid_page = page_locations[last_valid_idx]; : > can you add some comment in this block? e.g. first_row_index must have prog Done http://gerrit.cloudera.org:8080/#/c/16503/1/be/src/exec/parquet/parquet-common.cc@172 PS1, Line 172: int next_page_idx = page_idx + 1; : while (next_page_idx < page_locations.size() && :page_locations[next_page_idx].compressed_page_size == 0) { : > I may have missed something, but wouldn't it be simpler to create a copy of Yeah it'd also work, but I didn't want to dynamically allocate a new vector and copy integers. Though I didn't measure what would be the actual overhead. -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 13:40:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9952: Fix page index filtering for empty pages
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16503 to look at the new patch set (#2). Change subject: IMPALA-9952: Fix page index filtering for empty pages .. IMPALA-9952: Fix page index filtering for empty pages As IMPALA-4371 and IMPALA-10186 points out, Impala might write empty data pages. It usually does that when it has to write a bigger page than the current page size. If we really need to write empty data pages is a different question, but we need to handle them correctly as there are already such files out there. The corresponding Parquet offset index entries to empty data pages are invalid PageLocation objects with 'compressed_page_size' = 0. Before this commit Impala didn't ignore the empty page locations, but generated a warning. Since invalid page index doesn't fail a scan by default, Impala continued scanning the file with semi-initialized page filtering. This resulted in 'Top level rows aren't in sync' error, or a crash in DEBUG builds. With this commit Impala ignores empty data pages and still able to filter the rest of the pages. Also, if the page index is corrupt for some other reason, Impala correctly resets the page filtering logic and falls back to regular scanning. Testing: * Added unit test for empty data pages * Added e2e test for empty data pages * Added e2e test for invalid page index Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common-test.cc M be/src/exec/parquet/parquet-common.cc M testdata/data/README A testdata/data/alltypes_empty_pages.parquet A testdata/data/alltypes_invalid_pages.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test M tests/query_test/test_parquet_stats.py 9 files changed, 196 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16503/2 -- To view, visit http://gerrit.cloudera.org:8080/16503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Gerrit-Change-Number: 16503 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16551 ) Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7365/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 06 Oct 2020 12:37:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16551 ) Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java: http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@49 PS1, Line 49: throw new AnalysisException("The parameter of a partition transform should " + > line too long (91 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 06 Oct 2020 12:17:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16551 to look at the new patch set (#2). Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables This patch adds support for Iceberg Bucket and Truncate partition transforms. Both accept a parameter: number of buckets and width respectively. Usage: CREATE TABLE tbl_name (i int, p1 int, p2 timestamp) PARTITION BY SPEC ( p1 BUCKET 10, p1 TRUNCATE 5 ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.tables'); Change-Id: Idc75cd23045b274885607c45886319f4f6da19de --- M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java A fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 12 files changed, 340 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/16551/2 -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. Patch Set 2: Code-Review+2 Thanks for this fix, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 11:28:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6517/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 11:28:30 + Gerrit-HasComments: No
[Impala-ASF-CR] MPALA-10075: Reuse unchanged partition instances
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16392 ) Change subject: MPALA-10075: Reuse unchanged partition instances .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@7 PS5, Line 7: MPALA typo: IMPALA http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@11 PS5, Line 11: redudant typo: redundant http://gerrit.cloudera.org:8080/#/c/16392/5//COMMIT_MSG@13 PS5, Line 13: changes typo: changed http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@293 PS5, Line 293: if (oldFd == null || oldFd.getFileLength() != fd.getFileLength() : || oldFd.getModificationTime() != fd.getModificationTime() Are you sure that we shouldn't check other properties too? Or the modification time will change for all other changes? An example that comes to mind is HDFS rebalancing - my guess is that if the locations of some blocks change for a file, the modification time won't change, so we won't propagate the change, which can lead to suboptimal plans. One more thing: I would prefer to move this logic to FileDescriptor, e.g. in a function like equalsTo(), or isUnchanged(). http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541 PS5, Line 1541: filesChanged_ I think that relying on filesChanged_ is counter intuitive in equalsTo. Changing the name to something like "equalsToOriginal" would reflect the semantics better, as we assume here that the partition is compared to the same partition before the update. Checking the files here again is also a solution - it would waste some CPU but the semantics would be clear. http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1541 PS5, Line 1541: oldInstance == oldInstance_ Is it possible for these two to be different? http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1545 PS5, Line 1545: oldInstance_ We are using oldInstance_ here and in the last line while using oldInstance in all other places. This is correct as we have checked that the two are equal, but I would prefer more consistency. http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16392/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@933 PS5, Line 933: // TODO: trace Do you plan to change the logging below? http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py File tests/metadata/test_reuse_partitions.py: http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@49 PS5, Line 49: test_reuse_partitions An idea to extend this test is to run the same queries for an insert_only transactional table. Some operations are not supported there (ALTER TABLE), but I think that other operations should lead to the same result. What may affect this is writeId - if it is changed while the partition actually doesn't change, it could lead to less reuse than in the the non-transactional case. It would be good to verify that this doesn't happen. http://gerrit.cloudera.org:8080/#/c/16392/5/tests/metadata/test_reuse_partitions.py@55 PS5, Line 55: "create table %s.%s (id int) partitioned by (p int) stored as textfile" : % (unique_database, tbl_name)) nit: +2 indentation -- To view, visit http://gerrit.cloudera.org:8080/16392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2dd645c260d271291021e52fdac4b74924df1170 Gerrit-Change-Number: 16392 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 06 Oct 2020 10:44:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16545 ) Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet) .. Patch Set 1: (3 comments) Thanks for this new feature, some small nits http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/output-partition.h File be/src/exec/output-partition.h: http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/output-partition.h@25 PS1, Line 25: needed Needed? http://gerrit.cloudera.org:8080/#/c/16545/1/common/thrift/Descriptors.thrift File common/thrift/Descriptors.thrift: http://gerrit.cloudera.org:8080/#/c/16545/1/common/thrift/Descriptors.thrift@54 PS1, Line 54: struct TIcebergColumnDescriptor { : 1: required i32 fieldId : } Maybe some comment here? If this struct just contains fieldId, why not just use i32 in TColumnDescriptor? http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@213 PS1, Line 213: String tableLoc = feIcebergTable.getLocation(); : HadoopTables tables = IcebergUtil.getHadoopTables(); These two variables not been used in this method. -- To view, visit http://gerrit.cloudera.org:8080/16545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e Gerrit-Change-Number: 16545 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 09:45:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16550 to look at the new patch set (#2). Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating Iceberg table, we should use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.file_format'='parquet', 'iceberg.catalog'='hadoop.tables') Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 6 files changed, 22 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/16550/2 -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. Patch Set 2: (1 comment) Thanks for review! http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG@9 PS1, Line 9: > nit: in commit message body please keep the lines under 72 chars width. Done -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 09:30:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10221: Rename 'iceberg file format' to 'iceberg.file format' as Iceberg table property
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16550 ) Change subject: IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg table property .. Patch Set 1: Code-Review+1 (1 comment) Only a small nit about the commit message, otherwise LGTM! http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16550/1//COMMIT_MSG@9 PS1, Line 9: g.catalog', nit: in commit message body please keep the lines under 72 chars width. -- To view, visit http://gerrit.cloudera.org:8080/16550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623 Gerrit-Change-Number: 16550 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Oct 2020 09:27:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16551 ) Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7364/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 06 Oct 2020 09:25:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16551 ) Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java: http://gerrit.cloudera.org:8080/#/c/16551/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java@49 PS1, Line 49: throw new AnalysisException("The parameter of a partition transform should be " + line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 06 Oct 2020 09:06:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16551 Change subject: IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables .. IMPALA-10165: Implement Bucket and Truncate partition transforms for Iceberg tables This patch adds support for Iceberg Bucket and Truncate partition transforms. Both accept a parameter: number of buckets and width respectively. Usage: CREATE TABLE tbl_name (i int, p1 int, p2 timestamp) PARTITION BY SPEC ( p1 BUCKET 10, p1 TRUNCATE 5 ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.tables'); Change-Id: Idc75cd23045b274885607c45886319f4f6da19de --- M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java A fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 12 files changed, 340 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/16551/1 -- To view, visit http://gerrit.cloudera.org:8080/16551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idc75cd23045b274885607c45886319f4f6da19de Gerrit-Change-Number: 16551 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16534 ) Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash .. Patch Set 3: Looking at THdfsFileDesc it might be not possible to avoid memory overhead at the coordinator-side. The file descriptors only store the relative path, but we need the absolute path to filter Iceberg data files, therefore we need to construct the paths from + . So maybe we should just stick with Murmur3? I've spotted another issue with our FileDescriptor class. For getRelativePath() it uses the underlying flat buffer to return a value: https://github.com/apache/impala/blob/d453d52aadcbd158147b906813b22eb2944ac90b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L231 Since the flat buffer doesn't contain a Java String object, whenever we issue fileDesc.getRelativePath() it returns a newly constructed java String each time. -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 06 Oct 2020 08:23:46 + Gerrit-HasComments: No