[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20921 ) Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts. .. Patch Set 15: Code-Review+2 Carry Zihao's +1. Thanks for improving the packaging support! -- To view, visit http://gerrit.cloudera.org:8080/20921 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147 Gerrit-Change-Number: 20921 Gerrit-PatchSet: 15 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: ttz <2433038...@qq.com> Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20921 ) Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts. .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10498/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20921 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147 Gerrit-Change-Number: 20921 Gerrit-PatchSet: 16 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: ttz <2433038...@qq.com> Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20921 ) Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts. .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20921 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147 Gerrit-Change-Number: 20921 Gerrit-PatchSet: 16 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: ttz <2433038...@qq.com> Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21249 ) Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 06 Apr 2024 01:14:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12921: [WIP] Support locally built Ranger
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21160 ) Change subject: IMPALA-12921: [WIP] Support locally built Ranger .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15788/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21160 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Gerrit-Change-Number: 21160 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 06 Apr 2024 00:46:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12921: [WIP] Support locally built Ranger
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21160 to look at the new patch set (#3). Change subject: IMPALA-12921: [WIP] Support locally built Ranger .. IMPALA-12921: [WIP] Support locally built Ranger This patch adds the support for locally built Ranger in Impala's minicluster. The instructions on how to build Ranger locally and point Impala to the locally built Ranger server are provided as below. Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We execute the following to build Ranger. By default, the compressed tar files are produced under $RANGER_SRC_DIR/target. mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \ package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true We also need to build Impala's Java code so that Impala's Java code could consume the locally produced Ranger classes. It then suffices to execute the following to point Impala to the locally built Ranger server. 1. export RANGER_VERSION_OVERRIDE=\ $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \ -Dexpression=project.version -DforceStdout) 2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\ ranger-${RANGER_VERSION_OVERRIDE}-admin 3. source $IMPALA_HOME/bin/impala-config.sh 4. tar zxv -f $RANGER_SRC_DIR/target/\ ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \ -C $RANGER_SRC_DIR/target/ 5. $IMPALA_HOME/bin/create-test-configuration.sh 6. $IMPALA_HOME/bin/create-test-configuration.sh \ -create_ranger_policy_db 7. $IMPALA_HOME/testdata/bin/run-ranger-server.sh 8. $IMPALA_HOME/testdata/bin/setup-ranger.sh To-do: - Wait for RANGER-4770 to be resolved. - Wait for RANGER-4771 to be resolved. Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 --- M bin/impala-config.sh M testdata/bin/setup-ranger.sh R testdata/cluster/ranger/setup/all_database_policy_revised.json.template M testdata/cluster/ranger/setup/impala_user_non_owner.json.template M testdata/cluster/ranger/setup/impala_user_owner.json.template 5 files changed, 25 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/21160/3 -- To view, visit http://gerrit.cloudera.org:8080/21160 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Gerrit-Change-Number: 21160 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12291 impala checks hdfs ranger policy
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/20221 ) Change subject: IMPALA-12291 impala checks hdfs ranger policy .. Patch Set 9: > Patch Set 9: > > Hi all, based on our discussion so far, there are 2 approaches discussed. Any > any better alternative is also appreciated! > - Set the access level of a loaded table to both READ and WRITE as long as > Ranger is used as the authorization provider. > - Introduce a startup flag to allow the administrator to decide whether to > skip the file system permissions check during table loading. > > There are some implementation details to consider for each approach. > - How to determine whether Ranger is enabled (corresponding to the 1st > approach)? It seems checking the value of the key > 'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the > instance of Configuration as done in the patch set 9 could not be verified > easily via a new test due to HDFS Ranger plug-in not being configured > correctly. To be more specific, if we try to add the following configuration > via > https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py, > the name node of HDFS could not be started. More plumbing has to be done to > set up the plug-in. On the other hand, using the instance of BackendConfig > allows us to add an end-to-end test to briefly verify Impala's behavior after > the patch more easily. > > > dfs.namenode.inode.attributes.provider.class > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer > > > - What should be the default behavior with respect to setting the access > level of a loaded table after this patch (corresponding to the 2nd approach)? > It looks like making Impala assume the READ and WRITE access by default may > be better especially in the legacy catalog mode. This relieves the Impala > administrator of having to manually tweak the HDFS access control entries of > the target HDFS paths that are not writable to the Impala service every time > when an end user encounters such a problem. > > I have also collected the related tests that need to be revised if we decide > to adopt the 2nd approach and make Impala assume the READ and WRITE access by > default. > - > https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56 > (TestHdfsPermissions.test_insert_into_read_only_table()). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563 > (TestInsertBehaviour.test_multiple_group_acls). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331 > (TestInsertBehaviour.test_readonly_table_dir). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362 > (TestInsertBehaviour.test_insert_acl_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439 > (TestInsertBehaviour.test_load_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252 > (TestInsertBehaviour.test_mixed_partition_permissions). > - > https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202 > (TestInsertBehaviour.test_insert_file_permissions). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008 > (AnalyzeStmtsTest.TestLoadData). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047 > (AnalyzeStmtsTest.TestInsert). > - > https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119 > (CatalogObjectToFromThriftTest.TestPartitionedTable). The first option sounds good to me: - Set the access level of a loaded table to both READ and WRITE as long as Ranger is used as the authorization provider. The second option of introducing a startup flag would create confusion because it would only apply to the legacy catalog mode, not the local catalog mode. Further, for the cloud object stores, we are in any case doing the first approach. The downside of the first approach is that it will defer the full access level checks to execution time and suppose some partitions allow the write operation while other partitions don't, we can end up with partial writes into the table. However, this issue already exists with the local catalog mode (which is the default mode). -- To view, visit http://gerrit.cloudera.org:8080/20221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Gerrit-Change-Number: 20221 Gerrit-PatchSet:
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 ) Change subject: IMPALA-12543: Detect self-events before finishing DDL .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15787/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 05 Apr 2024 23:17:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21109 ) Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple queries .. Patch Set 17: (14 comments) http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java@1 PS17, Line 1: /* In some files, the License header is enclosed in the multi statement syntax /*. .. */ whereas in others it is the single line comment. Pls make it consistent in all the files and use whatever is being used in existing Impala files. Impala uses the single line comment style. '// ' http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@30 PS17, Line 30: queryCtx; Pls add underscore http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@31 PS17, Line 31: stmtTableCache Add underscore http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@94 PS16, Line 94: : nit: add space or newline after : so that the query string is separated out. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148 PS16, Line 148: if (e == null) { > I wanted the ability to print the stack trace hree. But I also throw an In So this should be e != null then. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@166 PS16, Line 166: if (line.trim().startsWith("--") || line.trim().equals("")) { Comments can be multi-statement also e.g. /* ignore this */ SELECT a, b, c. There could be other patterns not handled here. Can we not use the parser methods directly ? If this is supposed to be a temporary function or if this is expected to go through revision later, pls add the relevant comments. http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@209 PS16, Line 209: LOG.info("Using Impala planner"); Can you make this consistent with the corresponding message for Calcite planner on line 94. Something like: LOG.info("Using Impala Planner for the following query: " + queryCtx.getStmt()); http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@245 PS16, Line 245: Frontend Timeline (Calcite Planner)") This QueryContext is used in both cases right ? i.e if Calcite planner fails, fall back to Impala planner. So this message should be modified if the fall back occurs. http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@116 PS17, Line 116: logical plan nit: to be precise, can we say 'initial logical plan' http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@121 PS17, Line 121: optimized plan nit: to be precise, can we say 'optimized logical plan' . http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@234 PS17, Line 234: // hack to match the FrontEnd code Can you add some more context to this .. which part of the FE code is this referring to ? http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java: http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 ) Change subject: IMPALA-12543: Detect self-events before finishing DDL .. Patch Set 20: (1 comment) ps20 is a bigger refactoring where InProgressTableModification is being used in more places. I will rerun exhaustive tests overnight. http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2671 PS19, Line 2671: y { > nit: It might be worth to change this and 2 other overload of this method i Moved it inside InProgressTableModification class. -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 05 Apr 2024 22:55:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Hello Quanlong Huang, Jason Fehr, Sai Hemanth Gantasala, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21029 to look at the new patch set (#20). Change subject: IMPALA-12543: Detect self-events before finishing DDL .. IMPALA-12543: Detect self-events before finishing DDL test_iceberg_self_events has been flaky for not having tbls_refreshed_before equal to tbls_refreshed_after in-between query executions. Further investigation reveals concurrency bug due to db/table level lock is not taken during db/table self-events check (IMPALA-12461 part1). The order of ALTER TABLE operation is as follow: 1. alter table starts in CatalogOpExecutor 2. table level lock is taken 3. HMS RPC starts (CatalogOpExecutor.applyAlterTable()) 4. HMS generates the event 5. HMS RPC returns 6. table is reloaded 7. catalog version is added to inflight event list 8. table level lock is releases Meanwhile the event processor thread fetches the new event after 4 and before 7. Because of IMPALA-12461 (part 1), it can also finish self-events checking before reaching 7. Before IMPALA-12461, self-events would have needed to wait for 8. Note that this issue is only relevant for table level events, as self-events checking for partition level events still takes table lock. This patch fix the issue by adding newCatalogVersion to the table's inflight event list before updating HMS using helper class InProgressTableModification. If HMS update does not complete (ie., an exception is thrown), the new newCatalogVersion that was added is then removed. This patch also fix few smaller issues, including: - Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in MetastoreEventFactory.getFilteredEvents(). - Increment EVENTS_SKIPPED_METRIC in MetastoreTableEvent.reloadTableFromCatalog() if table is already in the middle of reloading (revealed through flaky test_skipping_older_events). - Rephrase misleading log message in MetastoreEventProcessor.getNextMetastoreEvents(). Testing: - Add TestEventProcessingWithImpala, run it with debug_action and sync_ddl dimensions. - Pass exhaustive tests. Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/DebugUtils.java M tests/custom_cluster/test_events_custom_configs.py 9 files changed, 886 insertions(+), 591 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21029/20 -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27 PS6, Line 27: class TupleFileReader; > Here's what I think is happening: Setting default destructor in tuple-cache-node.cc sounds like a good safety to add. http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@33 PS6, Line 33: namespace impala { > No, removed Ack http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@180 PS6, Line 180: if (status.ok()) { > I don't think we need to support Reset(), so I changed it's definition to t Ack http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-read-write-test.cc File be/src/exec/tuple-file-read-write-test.cc: http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-read-write-test.cc@182 PS8, Line 182: // Read back the empy file and make sure the reader can handle it typo: empy -> empty http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc File be/src/exec/tuple-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@89 PS6, Line 89: > Good point, we should check after each read() call. I switch to use Kudu's Ack http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-reader.cc File be/src/exec/tuple-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-reader.cc@99 PS8, Line 99: // Now, we know the total size of the variable-length data, and we can read ReadV already supports populating multiple kudu::Slice, should we do this as an array of 3 slices rather than splitting them up after the read? http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-writer.h File be/src/exec/tuple-file-writer.h: http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-writer.h@109 PS8, Line 109: bool aborted_ = false; Could these be a state enum instead? http://gerrit.cloudera.org:8080/#/c/21171/8/tests/custom_cluster/test_tuple_cache.py File tests/custom_cluster/test_tuple_cache.py: http://gerrit.cloudera.org:8080/#/c/21171/8/tests/custom_cluster/test_tuple_cache.py@101 PS8, Line 101: start_args=CACHE_START_ARGS + " --tuple_cache_capacity=64MB", cluster_size=1, This arg doesn't exist anymore, right? -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Apr 2024 20:58:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21249 ) Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15786/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 05 Apr 2024 20:31:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21249 Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization test_plain_count_star_optimization was disabled by IMPALA-12894 part 1, and part 2 didn't re-enable it. This patch re-enables it. Change-Id: I30629632742c0d402a6bb852a169359edac59eba --- M tests/query_test/test_iceberg.py 1 file changed, 0 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21249/1 -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21249 ) Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10497/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 05 Apr 2024 20:08:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12291 impala checks hdfs ranger policy
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/20221 ) Change subject: IMPALA-12291 impala checks hdfs ranger policy .. Patch Set 9: Hi all, based on our discussion so far, there are 2 approaches discussed. Any any better alternative is also appreciated! - Set the access level of a loaded table to both READ and WRITE as long as Ranger is used as the authorization provider. - Introduce a startup flag to allow the administrator to decide whether to skip the file system permissions check during table loading. There are some implementation details to consider for each approach. - How to determine whether Ranger is enabled (corresponding to the 1st approach)? It seems checking the value of the key 'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the instance of Configuration as done in the patch set 9 could not be verified easily via a new test due to HDFS Ranger plug-in not being configured correctly. To be more specific, if we try to add the following configuration via https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py, the name node of HDFS could not be started. More plumbing has to be done to set up the plug-in. On the other hand, using the instance of BackendConfig allows us to add an end-to-end test to briefly verify Impala's behavior after the patch more easily. dfs.namenode.inode.attributes.provider.class org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer - What should be the default behavior with respect to setting the access level of a loaded table after this patch (corresponding to the 2nd approach)? It looks like making Impala assume the READ and WRITE access by default may be better especially in the legacy catalog mode. This relieves the Impala administrator of having to manually tweak the HDFS access control entries of the target HDFS paths that are not writable to the Impala service every time when an end user encounters such a problem. I have also collected the related tests that need to be revised if we decide to adopt the 2nd approach and make Impala assume the READ and WRITE access by default. - https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56 (TestHdfsPermissions.test_insert_into_read_only_table()). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563 (TestInsertBehaviour.test_multiple_group_acls). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331 (TestInsertBehaviour.test_readonly_table_dir). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362 (TestInsertBehaviour.test_insert_acl_permissions). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439 (TestInsertBehaviour.test_load_permissions). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252 (TestInsertBehaviour.test_mixed_partition_permissions). - https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202 (TestInsertBehaviour.test_insert_file_permissions). - https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008 (AnalyzeStmtsTest.TestLoadData). - https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047 (AnalyzeStmtsTest.TestInsert). - https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119 (CatalogObjectToFromThriftTest.TestPartitionedTable). -- To view, visit http://gerrit.cloudera.org:8080/20221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Gerrit-Change-Number: 20221 Gerrit-PatchSet: 9 Gerrit-Owner: Halim Kim Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Halim Kim Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 05 Apr 2024 18:30:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 ) Change subject: IMPALA-12543: Detect self-events before finishing DDL .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15785/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 05 Apr 2024 18:24:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 ) Change subject: IMPALA-12543: Detect self-events before finishing DDL .. Patch Set 19: (3 comments) http://gerrit.cloudera.org:8080/#/c/21029/18//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21029/18//COMMIT_MSG@51 PS18, Line 51: - Pass exhaustive tests. > Rerun exhaustive tests overnight and it is now failing at some tests. I'll Done http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3321 PS18, Line 3321: ns.checkState(mod > Should replace this with modification.newVersionNumber() Done http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2671 PS19, Line 2671: addCatalogServiceIdentifiers nit: It might be worth to change this and 2 other overload of this method into static and provide way to access it via InProgressTableModification object. -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 05 Apr 2024 18:05:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Hello Quanlong Huang, Jason Fehr, Sai Hemanth Gantasala, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21029 to look at the new patch set (#19). Change subject: IMPALA-12543: Detect self-events before finishing DDL .. IMPALA-12543: Detect self-events before finishing DDL test_iceberg_self_events has been flaky for not having tbls_refreshed_before equal to tbls_refreshed_after in-between query executions. Further investigation reveals concurrency bug due to db/table level lock is not taken during db/table self-events check (IMPALA-12461 part1). The order of ALTER TABLE operation is as follow: 1. alter table starts in CatalogOpExecutor 2. table level lock is taken 3. HMS RPC starts (CatalogOpExecutor.applyAlterTable()) 4. HMS generates the event 5. HMS RPC returns 6. table is reloaded 7. catalog version is added to inflight event list 8. table level lock is releases Meanwhile the event processor thread fetches the new event after 4 and before 7. Because of IMPALA-12461 (part 1), it can also finish self-events checking before reaching 7. Before IMPALA-12461, self-events would have needed to wait for 8. Note that this issue is only relevant for table level events, as self-events checking for partition level events still takes table lock. This patch fix the issue by adding newCatalogVersion to the table's inflight event list before updating HMS. If HMS update does not complete (ie., an exception is thrown), the new newCatalogVersion that was added is then removed. This patch also fix few smaller issues, including: - Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in MetastoreEventFactory.getFilteredEvents(). - Increment EVENTS_SKIPPED_METRIC in MetastoreTableEvent.reloadTableFromCatalog() if table is already in the middle of reloading (revealed through flaky test_skipping_older_events). - Rephrase misleading log message in MetastoreEventProcessor.getNextMetastoreEvents(). Testing: - Add TestEventProcessingWithImpala, run it with debug_action and sync_ddl dimensions. - Pass exhaustive tests. Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/DebugUtils.java M tests/custom_cluster/test_events_custom_configs.py 9 files changed, 811 insertions(+), 549 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21029/19 -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 ) Change subject: IMPALA-12543: Detect self-events before finishing DDL .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3321 PS18, Line 3321: newCatalogVersion Should replace this with modification.newVersionNumber() -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 05 Apr 2024 17:06:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15784/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Apr 2024 16:40:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/21171/7/be/src/exec/tuple-file-writer.cc File be/src/exec/tuple-file-writer.cc: http://gerrit.cloudera.org:8080/#/c/21171/7/be/src/exec/tuple-file-writer.cc@130 PS7, Line 130: Substitute("Write of size $0 would cause $1 to exceed the maximum file size of $2", > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py File tests/custom_cluster/test_tuple_cache.py: http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@60 PS7, Line 60: } > flake8: E123 closing bracket does not match indentation of opening bracket' Done http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@118 PS7, Line 118: @ > flake8: E303 too many blank lines (2) Done http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@173 PS7, Line 173: e > flake8: F841 local variable 'e' is assigned to but never used Done -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Apr 2024 16:15:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Hello Kurt Deschler, Yida Wu, Alexey Serbin, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21171 to look at the new patch set (#8). Change subject: IMPALA-12905: Disk-based tuple caching .. IMPALA-12905: Disk-based tuple caching This implements on-disk caching for the tuple cache. The TupleCacheNode uses the TupleFileWriter and TupleFileReader to write and read back tuples from local files. The file format uses RowBatch's standard serialization used for KRPC data streams. The TupleCacheMgr is the daemon-level structure that coordinates the state machine for cache entries, including eviction. When a writer is adding an entry, it inserts an IN_PROGRESS entry before starting to write data. This does not count towards cache capacity, because the total size is not known yet. This IN_PROGRESS entry prevents other writers from concurrently writing the same entry. If the write is successful, the entry transitions to the COMPLETE state and updates the total size of the entry. If the write is unsuccessful and a new execution might succeed, then the entry is removed. If the write is unsuccessful and won't succeed later (e.g. if the total size of the entry exceeds the max size of an entry), then it transitions to the TOMBSTONE state. TOMBSTONE entries avoid the overhead of trying to write entries that are too large. Given these states, when a TupleCacheNode is doing its initial Lookup() call, one of three things can happen: 1. It can find a COMPLETE entry and read it. 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it cannot read or write the entry. 3. It finds no entry and inserts its own IN_PROGRESS entry to start a write. The tuple cache is configured using the tuple_cache parameter, which is a combination of the cache directory and the capacity similar to the data_cache parameter. For example, /data/0:100GB uses directory /data/0 for the cache with a total capacity of 100GB. This currently supports a single directory, but it can be expanded to multiple directories later if needed. The cache eviction policy can be specified via the tuple_cache_eviction_policy parameter, which currently supports LRU or LIRS. The tuple_cache parameter cannot be specified if allow_tuple_caching=false. This contains contributions from Michael Smith, Yida Wu, and Joe McDonnell. Testing: - This adds basic custom cluster tests for the tuple cache. Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf --- M be/src/exec/CMakeLists.txt M be/src/exec/tuple-cache-node.cc M be/src/exec/tuple-cache-node.h A be/src/exec/tuple-file-read-write-test.cc A be/src/exec/tuple-file-reader.cc A be/src/exec/tuple-file-reader.h A be/src/exec/tuple-file-writer.cc A be/src/exec/tuple-file-writer.h M be/src/runtime/CMakeLists.txt M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/tuple-cache-mgr-test.cc A be/src/runtime/tuple-cache-mgr.cc A be/src/runtime/tuple-cache-mgr.h M bin/start-impala-cluster.py M common/thrift/metrics.json A tests/custom_cluster/test_tuple_cache.py 17 files changed, 2,225 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21171/8 -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12925: Fix decimal data type for external JDBC table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21218 ) Change subject: IMPALA-12925: Fix decimal data type for external JDBC table .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b Gerrit-Change-Number: 21218 Gerrit-PatchSet: 10 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 05 Apr 2024 09:16:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12925: Fix decimal data type for external JDBC table
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21218 ) Change subject: IMPALA-12925: Fix decimal data type for external JDBC table .. IMPALA-12925: Fix decimal data type for external JDBC table Decimal type is a primitive data type for Impala. Current code returns wrong values for columns with decimal data type in external JDBC tables. This patch fixes wrong values returned from JDBC data source, and supports pushing down decimal type of predicates to remote database and remote Impala. The decimal precision and scale of the columns in external JDBC table must be no less than the decimal precision and scale of the corresponding columns in the table of remote database. Otherwise, Impala fails with an error since it may cause truncation of decimal data. Testing: - Added Planner test for pushing down decimal type of predicates. - Added end-to-end unit-tests for tables with decimal type of columns for Postgres, MySQL, and Impala-to-Impala. - Passed core-tests. Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b Reviewed-on: http://gerrit.cloudera.org:8080/21218 Reviewed-by: Michael Smith Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins --- M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java M fe/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M testdata/bin/clean-mysql-env.sh M testdata/bin/create-ext-data-source-table.sql M testdata/bin/load-ext-data-sources.sh M testdata/bin/setup-mysql-env.sh M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables-predicates.test M testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test M testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test M testdata/workloads/functional-query/queries/QueryTest/mysql-ext-jdbc-tables.test 14 files changed, 576 insertions(+), 21 deletions(-) Approvals: Michael Smith: Looks good to me, but someone else must approve Abhishek Rawat: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/21218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b Gerrit-Change-Number: 21218 Gerrit-PatchSet: 11 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh