[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: (2 comments) > Patch Set 13: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/ To reproduce the compilation error, you need to execute "export USE_APACHE_HIVE=true" before using buildall.sh. http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java@955 PS13, Line 955: MetaStoreServerUtils This class doesn't exist in apache-hive-3.1.3. I think we should still use MetaStoreUtils#validateColumnName(). https://github.com/apache/hive/blob/rel/release-3.1.3/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L505-L507 If the implementation of this method in two MetastoreShim classes are the same, we can put this method in Hive3MetastoreShimBase. http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@125 PS13, Line 125: import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; nit: already imported at L87 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 08 Dec 2023 07:36:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20648 ) Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14618/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 08 Dec 2023 03:01:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/20485/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20485/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-10949: Improve batching logic of partition : events nit: put the title in one line http://gerrit.cloudera.org:8080/#/c/20485/3/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20485/3/tests/custom_cluster/test_events_custom_configs.py@403 PS3, Line 403: EventProcessorUtils.wait_for_event_processing(self) Can we also verify all events are skipped? The metric of "events-skipped" can be checked. -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 08 Dec 2023 02:34:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20491 ) Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14617/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 08 Dec 2023 02:35:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20648 ) Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1556 PS3, Line 1556: wasEventSyncTurnedOn > Wouldn't it be better to move this block earlier, before self event check? In the above scenario, first (2) is executed, and while reloading table metadata it'll fetch altered table property from HMS. At (3), if the table is loaded, it'll mark the table as stale (See L#809). When (4) arrives, inflight versions will not have (2) since the table was marked as stale in the previous step, so the event is processed and marked that event sync is enabled. If there is no external event before (4), then the event would have been skipped as self-event. I have added an end-to-end test to mimic this scenario. http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1557 PS3, Line 1557: handleEventSyncTurnedOn(); > I think that moving this block to a separate function like handleEventSyncT Ack http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1562 PS3, Line 1562: oad. > Are we prepared for throwing an exception here? Using getTableNoThrow seems Good point. will implement your suggestion -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 08 Dec 2023 02:22:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20648 to look at the new patch set (#4). Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing Currently we require a global invalidate to reset the events processor if the events sync is re-enabled on a table from HMS. This patch eliminates the need to reset the catalog cache when events sync is re-enabled. Implementation details: when events sync is re-enabled on table via HMS 1) If the table exists in Impala, a) We can just invalidate the table, if the current event is greater than the create event id of the table, so that it is reloaded the first time query accesses it. b) Otherwise we can just ignore the event. 2) If the table doesn't exist in Impala, create a Incomplete table, if there is no entry in the event delete log for this table. Note: If the eventSync is disabled on a table, for all subsequent table events, if the table object is loaded, mark the table as stale, so that it is reloaded the next time query accesses it. Testing: 1) manually verified few scenarios. 2) Added test case for the above scenarios. Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 157 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/20648/4 -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20491 to look at the new patch set (#4). Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles .. IMPALA-12443: Add catalog timeline for all ddl profiles This is a follow-up work of IMPALA-12024 where we add the catalog timeline for CreateTable statements. Using the same mechanism, this patch adds catalog timeline for all DDL/DML profiles, including REFRESH and INSERT. The goal is to add timeline markers after each step that could be blocked, e.g. acquiring locks, external RPCs. Tried to add some constant strings for widely used events, e.g. "Fetched table from Metastore". Didn't do so for events that only occurs once. Most of the catalog methods have a new argument for tracking the execution timeline. To avoid adding null checks everywhere, for code paths that don't need a catalog profile, e.g. EventProcessor, creates an unused catalogTimeline as the argument. We can use them in future works, e.g. expose execution timeline of a slow processing on an HMS event. This patch also removes some unused overloads of HdfsTable#load() and HdfsTable#reloadPartitionsFromNames(). Tests: - Add e2e test to verify the catalog timeline in some DDLs. Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 --- M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/EventSequence.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java M tests/query_test/test_observability.py 33 files changed, 924 insertions(+), 603 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/20491/4 -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14616/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:55:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:34:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39 PS5, Line 39: const int ENGINE_CACHE_SIZE = 990; > This is larger than the value that was here before. Was the test failing un Yeah, this is the real number of the cache containing one compiled function. The earlier count was misleading as the cache was empty. I've adjusted the testcase, and added LlvmCodeGenCacheTest::CheckObjCacheExists to confirm the existence of contents within the cache. http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176 PS5, Line 176: // The function is to create and return a CodeGenObjectCache which contains a specific > nit: "contains" not "contians" Done -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39 PS5, Line 39: const int ENGINE_CACHE_SIZE = 990; This is larger than the value that was here before. Was the test failing until this patch set? http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176 PS5, Line 176: // The function to create and return a CodeGenObjectCache which contians a specific nit: "contains" not "contians" -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:07:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/ -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 23:41:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 12: Code-Review+1 (3 comments) Great work! Only minor comments. http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py@257 PS12, Line 257: [['to_utc_timestamp'], 'TIMESTAMP', ['TIMESTAMP', 'STRING', 'BOOLEAN'], :"impala::TimestampFunctions::ToUtcUnambiguous"], Maybe this should rather go to the invisible functions list? I am not sure here - while I would prefer to not make this a supported functions, it can be useful for users in some cases. http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@744 PS12, Line 744:*/ Can you mention that now this can return a list for timestamps? http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test File testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test: http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test@18 PS12, Line 18: order by id here and at other queries: "order by id" is not necessary - the rows in the results section + the actual results are ordered before comparison by default in EE tests otherwise most tests would need order by, as Impala has no guarantee for the order of returned rows if there are multiple files in a table -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 12 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 07 Dec 2023 22:39:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14615/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 07 Dec 2023 20:01:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 07 Dec 2023 19:49:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20485 to look at the new patch set (#3). Change subject: IMPALA-10949: Improve batching logic of partition events .. IMPALA-10949: Improve batching logic of partition events Currently the batching logic for partitions evaluates self-event check by looking at the event id of the last event in the batch. This patch improves the batching logic by evaluating table's lastSyncEventId to the current event's event id and decide whether to batch the event or not. This way we can have fewer batch sizes and avoid self events if any, beforehand. Testing: Added an end-to-end test to verfiy the batch size for ALTER_PARTITION batch events. Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 38 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/20485/3 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@741 PS2, Line 741: Older > the name doesn't match the semantics, the function returns true if the even Ack http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@747 PS2, Line 747: > Exception handling could be skipped by using getTableNoThrow instead of get Ack -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 07 Dec 2023 19:33:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 19:13:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 19:13:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/20677 ) Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables .. Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h File be/src/exec/iceberg-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h@81 PS9, Line 81: /// Verifies that the row batch does not contain duplicated rows. Could you add an explanation of intent, it is not clear for me why this is needed. http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h@82 PS9, Line 82: VerifyRowBatch Nit: For readability could you rename to something like: VerifyRowsNotDuplicated/VerifyRowDistinctiveness http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/multi-table-sink.cc File be/src/exec/multi-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/multi-table-sink.cc@52 PS9, Line 52: tsinkBase nit: tsink_base http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/runtime/dml-exec-state.h File be/src/runtime/dml-exec-state.h: http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/runtime/dml-exec-state.h@72 PS9, Line 72: void UpdatePartition(const std::string& partition_name, : int64_t num_rows_delta, const DmlStatsPB* insert_stats, : bool is_delete = false); nit: should fit into 2 lines http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java@120 PS9, Line 120: id nit: ++nextTableId_ and return nextTableId_ http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java: http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@87 PS9, Line 87: public List getDeletePartitionExprs(Analyzer analyzer) : throws AnalysisException { nit: should fit into 1 line http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@94 PS9, Line 94: public List getDeleteResultExprs(Analyzer analyzer) : throws AnalysisException { nit: should fit into 1 line http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/planner/Planner.java@185 PS9, Line 185: nit: indentation -- To view, visit http://gerrit.cloudera.org:8080/20677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08 Gerrit-Change-Number: 20677 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2023 17:55:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20648 ) Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1556 PS3, Line 1556: wasEventSyncTurnedOn Wouldn't it be better to move this block earlier, before self event check? I am unsure about the handling of enabling metadata sync as self event. Assume the following scenario: 0. event processing starts disabled for a table 1. another component changes a random table prop with ALTER TABLE 2. before processing the event from 1 impala enables hms sync on the table 3. the event from 1 arrives, but catalogd skips it as hms sync was still disabled at that point 4. the event from 2 arrives, but it is skipped at it is a self event I think that at 4 catalogd should mark the table as stale. as it didn't process event from 1 http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1557 PS3, Line 1557: // check if the table exists or not. 1) if the table doesn't exist create an I think that moving this block to a separate function like handleEventSyncTurnedOn() would make the main path clearer. http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1562 PS3, Line 1562: getTable Are we prepared for throwing an exception here? Using getTableNoThrow seems more fitting to avoid exception when the DB also doesn't exist (not sure if this is possible at this point) -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 07 Dec 2023 15:41:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14614/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 07 Dec 2023 15:36:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20760 to look at the new patch set (#3). Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. Part 2 had some limitations, i.e. it could not update Iceberg tables if any of the following were true: * UPDATE value of partitioning column * UPDATE table that went through partition evolution * Table has SORT BY properties The problem with partitions is that the delete record and new data record might belong to different partitions and records are shuffled across based on the partitions of the delete records, hence the data files might not get written efficiently. The problem with SORT BY properties, is that we need to write the position delete files ordered by (file_path, position). To address the above problems, this patch set introduces a new backend operator: IcebergBufferedDeleteSink. This new operator extracts and aggregates the delete record information from the incoming row batches, then in FlushFinal it orders the position delete records and writes them out to files. This mechanism is similar to Hive's approach: https://github.com/apache/hive/pull/3251 Now records can get shuffled around based on the new data records' partition values, and the SORT operator sorts the records based on the SORT BY properties. There's only one case we don't allow the UPDATE statement: * UPDATE partition column AND * Right-hand side of assignment is non-constant expression AND * UPDATE statement has a JOIN When all of the above conditions meet, it would be possible to have an incorrect JOIN condition that has multiple matches for the data records, then the duplicated records would be shuffled independently (based on the new partition value) to different backend SINKs, and the different backend SINK would not be able to detect the duplicates. If any of the above conditions was false, then the duplicated records would be shuffled together to the same SINK, that could do the duplicate check. This patch also moves some code from IcebergDeleteSink to the newly introduced IcebergDeleteSinkBase. TODO: * planner tests * e2e tests * concurrent tests * Impala/Hive interop tests Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 --- M be/src/exec/CMakeLists.txt M be/src/exec/data-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h A be/src/exec/iceberg-buffered-delete-sink.cc A be/src/exec/iceberg-buffered-delete-sink.h A be/src/exec/iceberg-delete-sink-base.cc A be/src/exec/iceberg-delete-sink-base.h A be/src/exec/iceberg-delete-sink-config.cc A be/src/exec/iceberg-delete-sink-config.h M be/src/exec/iceberg-delete-sink.cc M be/src/exec/iceberg-delete-sink.h M be/src/exec/table-sink-base.cc M be/src/exec/table-sink-base.h M be/src/exprs/slot-ref.h M be/src/runtime/dml-exec-state.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test 29 files changed, 1,168 insertions(+), 323 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/3 -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 14:51:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14613/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 14:43:25 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20491 ) Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14612/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 07 Dec 2023 14:26:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 13: (1 comment) > Uploaded patch set 13. http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py File tests/metadata/test_column_unicode.py: http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py@35 PS12, Line 35: cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension()) > This test still runs "for different file formats" which is completely redun Done -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 14:16:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. IMPALA-12465: Unicode column name support Impala depends on Hive functions for column name validation and uses validateName() function for the same. Since Hive already supports unicode column names, the patch just updates the column name validation function to validateColumnName(). validateName() checks for a certain conformance based on pattern matching standards while validateColumnName() places no restrictions on column names at the Metadata level. Testing: The support is tested and cross-checked with Hive. The tests can be found in unicode-column-name.test. Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 --- M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test A tests/metadata/test_column_unicode.py M tests/shell/test_shell_interactive.py 6 files changed, 374 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/13 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20491 to look at the new patch set (#3). Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles .. WIP IMPALA-12443: Add catalog timeline for all ddl profiles This patch adds catalog timeline for all DDL/DML profiles, including REFRESH and INSERT. The goal is to add timeline markers after each step that could be blocked, e.g. acquiring locks, external RPCs. Removed some unused overloads of HdfsTable#load() and HdfsTable#reloadPartitionsFromNames(). Tried to add some constant strings for widely used events, e.g. "Fetched table from Metastore". Didn't do so for events that only occurs once. Most of the catalog methods have a new argument for tracking the execution timeline. To avoid adding null checks everywhere, for code paths that don't have a catalog profile, creates an unused catalogTimeline as the argument. Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 --- M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/EventSequence.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 28 files changed, 783 insertions(+), 544 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/20491/3 -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20491 ) Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20491/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20491/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1200 PS3, Line 1200: fileFormatParams.getFile_format(), numUpdatedPartitions, catalogTimeline); line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5 Gerrit-Change-Number: 20491 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 07 Dec 2023 14:00:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 07 Dec 2023 13:52:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py File tests/metadata/test_column_unicode.py: http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py@35 PS12, Line 35: cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension()) This test still runs "for different file formats" which is completely redundant because the test file already contains different file formats, so the exact same test will run multiple times. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 13:46:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17 PS4, Line 17: after the compila > Could you clarify the relationship of the module and exec engine here? I.e. Yes. Added some comments. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 07 Dec 2023 13:38:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20677 ) Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables .. Patch Set 9: (4 comments) I went through this patch mostly for my own education. Left some minor questions or comments, nothing serious. Great work Zoli! http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java File fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java: http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java@54 PS9, Line 54: abstract void substituteResultExprs(ExprSubstitutionMap smap, Analyzer analyzer); nit: could you add a comment for this function http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@1214 PS9, Line 1214: private static Expr getIcebergPartitionTransformExpr(IcebergPartitionField partField, Peter's DROP PARTITION patch added some new functionality for converting strings into partition values (?) if I'm not mistaken. Just mentioning it here to see if any of those could be re-used. http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test File testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test: http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test@753 PS9, Line 753: AuthorizationException: User '$USER' does not have privileges to access: functional_parquet.iceberg_v2_delete_positional Isn't the error msg misleading. It says the user doesn't have permissions on the table, but in fact the issue is that there is column masking on the table. This use case anyway made me think: wouldn't it make sense to allow the user to update these tables if they don't change the masked columns? I can understand that from implementation pow this could be difficult. http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test: http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@647 PS9, Line 647: masked tables nit: I guess these are not masked tables, but tables with row filtering. -- To view, visit http://gerrit.cloudera.org:8080/20677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08 Gerrit-Change-Number: 20677 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2023 13:36:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Doesn't the benchmarking framework print a grand total? If not, we can let It seems our benchmarking framework doesn't print the total. Will double check and file a jira if we don't have it. http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18 PS4, Line 18: the executi > Why are they pre-compiled? Aren't they compiled already? Or you mean they a I think the "pre-compiled" means it is compiled in the last round and ready for use. Yeah, maybe call it "compiled" is better, because in this case it means the first round to write the compiled stuff into the cache. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164 PS4, Line 164: cached_engine > Thinking about this, wouldn't 'cached_engine...' express the meaning better Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155 PS4, Line 155: y > I think removing the negation and inverting the branches is better, one les Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39 PS4, Line 39: uffer().size(); > Is a copy necessary here? Can't we move the buffer somehow? CodeGenObjectCache doesn't have the ownership of ObjBuffer, the llvm engine may use the ObjBuffer later, so it is safer to make a copy. (And MemoryBufferRef may not allow to move the real Buffer, the example I can see is to do a copy) http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42 PS4, Line 42: // can conserve memory. > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57 PS4, Line 57: // ID. If the ID doesn't match the one in the object cache, a warning is logged for > Will it always copy the buffer? Is there a way to access it without copying >From the definition of the interface getObject(), "Returns a pointer to a >newly allocated MemoryBuffer that contains the object which corresponds with >Module M", it would be safer to return the copy of the buffer. >https://llvm.org/doxygen/classllvm_1_1ObjectCache.html#details http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60 PS4, Line 60: std::lock_guard l(mutex_); > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63 PS4, Line 63: << cached_obj_->getMemBufferRef().getBufferSize(); > Can this function be called when 'key_' is empty? Isn't that a bug? If there is no cache yet, would return nullptr, for example, the llvm engine would look up the cache to see if it contains the pre-compiled module, if it doesn't contain any, the llvm engine would proceed to compile the module and then write compiled module to the cache. This happens in cache look up and doesn't hit any. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330 PS4, Line 330: > See comment on llvm-codegen-cache.h:164. Changed the one in llvm-codegen-cache.h, but would prefer to keep this, because this one is the cache for llvm engine to write the compiled functions to, it doesn't have to be anything "cached". Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925 PS4, Line 925: > Do I understand it correctly that this is for writing to the cache (storing Yes, we can say that the engine_cache_ is for writing, and engine_cache_cached_ is for reading. Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929 PS4, Line 929: > We could extend the comment with what is written on llvm-codegen.cc:1499, i Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192 PS4,
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 2: (7 comments) Left some minor comments, but the code looks good to me. http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@126 PS2, Line 126: Recusrive Recursive http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@130 PS2, Line 130: inot into http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@87 PS2, Line 87: // Create field accessors : for (SlotDescriptor* slot_desc: tuple_desc_->slots()) { : if (slot_desc->type().IsStructType()) { : // Recursively get the field ids for struct fields : RETURN_IF_ERROR(CollectStructFieldAccessors(env, slot_desc)); : } else if (slot_desc->col_path().size() > 1) { : // Slot that is child of a struct without tuple, can occur when a struct member is : // in the select list : int root_type_index = slot_desc->col_path()[0]; : ColumnType current_type = : tuple_desc_->table_desc()->col_descs()[root_type_index].type(); : for (int i = 1; i < slot_desc->col_path().size() - 1; ++i) { : current_type = current_type.children[slot_desc->col_path()[i]]; : } : int field_id = current_type.field_ids[slot_desc->col_path().back()]; : RETURN_IF_ERROR(AddAccessorForFieldId(env, field_id, slot_desc->id())); : } else { : // For primitive types outside STRUCT use the ColumnDescriptors : int field_id = tuple_desc_->table_desc()->GetColumnDesc(slot_desc).field_id(); : RETURN_IF_ERROR(AddAccessorForFieldId(env, field_id, slot_desc->id())); : } : } nit: this could be moved to its own method. http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-row-reader.h@44 PS2, Line 44: Materlilize Materialize http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@230 PS2, Line 230: rootTable instanceof IcebergMetadataTable) return; : if (!(rootTable instanceof FeFsTable nit: we could have a helper method for readability like 'TableTypeSupportsStruct(FeTable tbl)' http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java: http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@73 PS2, Line 73: "Adding column: \"" + col.getName() + "\" with type: \"" + col.getType() + : "\" to metadata table." LOG.trace() supports formatting with placeholders {}. http://gerrit.cloudera.org:8080/#/c/20759/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/20759/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@496 PS2, Line 496: Can we also add a SELECT * query with EXPAND_COMPLEX_TYPES=true? -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 2 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2023 12:27:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14610/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 11:14:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 1: (17 comments) Left a few comments, but the change looks great! http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h@87 PS1, Line 87: treat_nulls_equal_ IS NOT DISTINCT FROM is a well-known SQL term, I think it would be better to keep that, but also add the additional comments about NULL-handling. http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift@625 PS1, Line 625: equality_ids This might have a better place in TIcebergTable. Though I see this is probably temporary, and later we might have the equality_ids in THdfsFileDesc. http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift@404 PS1, Line 404: treat_nulls_equal Again, I think we should keep the SQL terminology here, but keep the additional comment about NULLs. http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@58 PS1, Line 58: except it returns True if the rhs is NULL Can we update this comment: except it returns True of both sides are NULLs. http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@106 PS1, Line 106: TODO Could you please add IMPALA-12598? http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@108 PS1, Line 108: column nit: columns http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@160 PS1, Line 160: || isDeleteRowsJoin_ This wouldn't be needed if we passed Operator.NOT_DISTINCT in the equality predicates. http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@197 PS1, Line 197: positionDeleteFiles_.isEmpty() && equalityDeleteFiles_.isEmpty() nit: for readability, it might be worth to extract this condition to a 'noDeleteFiles()' method http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@258 PS1, Line 258: addVirtualDataSeqNumSlot(tblRef_); nit: this is always needed for equality deletes, so this method call could be moved to addEqualityColumnSlots(). http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@323 PS1, Line 323: tblRef.getDesc().getSlots().stream() : .filter(s -> s.getVirtualColumnType() == : TVirtualColumnType.ICEBERG_DATA_SEQUENCE_NUMBER) : .findFirst() Maybe SingleNodePlanner.addSlotRefToDesc() could return he slot desc. http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@405 PS1, Line 405: dataSlotDesc.getColumn() instanceof IcebergColumn This must be always true for non-virtual columns, right? http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@424 PS1, Line 424: data file table? http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@429 PS1, Line 429: Operator.EQ Could we have Operator.NOT_DISTINCT here? http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@464 PS1, Line 464: if (getIceTable().getIcebergApiTable().schemas().size() > 1) { : throw new ImpalaRuntimeException("Equality delete files are not supported for " + : "tables with schema evolution"); : } Why do we have this restriction? We
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 12: (5 comments) > Uploaded patch set 12. http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test File testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test: http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@3 PS10, Line 3: testtbl1( > Actually you don't need it in SHOW TABLES either, you can just write I've removed it from most formats except orc and avro, it seems we have a bug where we use HIVE_QUERY which we can file as a bug in another jira. http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@8 PS10, Line 8: # Make sure creating a table with the same name doesn't throw an error when > Ok, we can leave this test in. Done http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@45 PS10, Line 45: QUERY > Ok. Ack http://gerrit.cloudera.org:8080/#/c/20506/5/tests/metadata/column-unicode.py File tests/metadata/column-unicode.py: http://gerrit.cloudera.org:8080/#/c/20506/5/tests/metadata/column-unicode.py@12 PS5, Line 12: > That being said, it's not required. Ack http://gerrit.cloudera.org:8080/#/c/20506/10/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/20506/10/tests/shell/test_shell_interactive.py@408 PS10, Line 408: test_tbl > Ok, you can leave it like that. But in that case extracting the table name Ack -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 10:50:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. IMPALA-12465: Unicode column name support Impala depends on Hive functions for column name validation and uses validateName() function for the same. Since Hive already supports unicode column names, the patch just updates the column name validation function to validateColumnName(). validateName() checks for a certain conformance based on pattern matching standards while validateColumnName() places no restrictions on column names at the Metadata level. Testing: The support is tested and cross-checked with Hive. The tests can be found in unicode-column-name.test. Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 --- M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test A tests/metadata/test_column_unicode.py M tests/shell/test_shell_interactive.py 6 files changed, 369 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/12 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/14609/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 2 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2023 10:36:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Tamas Mate has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns As the slots have already been created on the frontend this change focuses on populating them on the backend side. There are two major parts of this commit. Obtaining the right Accessors for the slot and recursively filling the tuples with data. The field ids are present in the struct slot's ColumnType field as a list of integers. This list can be indexed with the correct element of the SchemaPath to obtain the field id for a struct member and with that the Accessor. Once the Accessors are available the IcebergRowReader's MaterializeTuple method can be called recursively to write the primitive slots of a struct slot. Testing: - Added E2E tests Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 8 files changed, 164 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/20759/2 -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 2 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.h@45 PS1, Line 45: Status MaterializeTuple(JNIEnv* env, jobject struct_like_row, const TupleDescriptor* tuple_desc, > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@79 PS1, Line 79: accessed_value = env->CallObjectMethod(accessor, iceberg_accessor_get_, struct_like_row); > line too long (95 > 90) Done http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@104 PS1, Line 104: RETURN_IF_ERROR(WriteStructSlot(env, struct_like_row, slot_desc, tuple, tuple_data_pool)); > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@172 PS1, Line 172: Status IcebergRowReader::WriteStructSlot(JNIEnv* env, jobject struct_like_row, SlotDescriptor* slot_desc, Tuple* tuple, > line too long (119 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 07 Dec 2023 10:07:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Dec 2023 09:30:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@741 PS2, Line 741: Newer the name doesn't match the semantics, the function returns true if the event's event id is NOT newer than lastSyncedEventId http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@747 PS2, Line 747: catch Exception handling could be skipped by using getTableNoThrow instead of getTable. It returns null if the db or table is not found. -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 07 Dec 2023 09:06:55 + Gerrit-HasComments: Yes