[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10200/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 27 Jan 2024 07:56:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 27 Jan 2024 07:56:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20957 ) Change subject: IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15078/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23995af4869e7ab8d826a1dd4d1197a21e738169 Gerrit-Change-Number: 20957 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 27 Jan 2024 06:10:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20957 ) Change subject: IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC .. IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC GetPartialCatalogObject RPCs was seen to hang in coordinator side caused by networking issue. Due to the piggyback mechanism of fetching metadata in local-catalog mode, a hanging RPC on shared metadata could block other queries. Since GetPartialCatalogObject RPCs are read-only requests, they can be cleanly retried. This patch creates a dedicated catalogd client cache for GetPartialCatalogObject RPC, and defines a flag variable for the RPC timeout. Testing: - Manually verified the metrics of the new catalogd client cache. - Passed the core tests. Change-Id: I23995af4869e7ab8d826a1dd4d1197a21e738169 --- M be/src/exec/catalog-op-executor.cc M be/src/runtime/client-cache.cc M be/src/runtime/client-cache.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M common/thrift/metrics.json 6 files changed, 48 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/20957/3 -- To view, visit http://gerrit.cloudera.org:8080/20957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23995af4869e7ab8d826a1dd4d1197a21e738169 Gerrit-Change-Number: 20957 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15077/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 02:55:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#10). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 24 files changed, 399 insertions(+), 98 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/10 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 9: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15076/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 9 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 01:50:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc@263 PS7, Line 263: return status; > Should this also call codegen->reset()? Done http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc@1050 PS7, Line 1050: coord_instance_->GetCodegenRef(results->GetCodegenPtr()); > Oh, I think I understand this now. It's assigning the codegen object held b Changed as suggested. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h File be/src/runtime/fragment-state.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h@58 PS7, Line 58: void GetCodegenRef(std::shared_ptr& codegen); > nit: this makes more sense to me as SetCodegenRef. And most idiomatic versi Changed to SetCodegenRef. I want to preserve the existing references. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h@53 PS1, Line 53: TImpalaQueryOptions::DELAY_MATERIALIZE_RESULTS_THRESHOLD + 1); \ > line too long (108 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h File be/src/service/query-result-set.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@98 PS7, Line 98: std::shared_ptr& GetCodegenPtr() { return codegen; } > Safer to return a const&. Don't want a caller to be able to reset it to del This is set externally and needs to be non-const. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@109 PS7, Line 109: std::shared_ptr codegen; > I don't see this set anywhere. It is set externally as a reference. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@190 PS1, Line 190: /// Add a row from a TResultRow > Functions and variables have blank lines around them, and comments describi Done http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc@168 PS3, Line 168: > This should be moved to the 'private' block below. Done http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc@107 PS5, Line 107: /// Handle to shared runtime state. This is reentrant > nit: Trailing slash. Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 01:29:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#9). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 23 files changed, 397 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/9 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 9 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12426: Query History Table
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: (14 comments) http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt File be/src/service/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt@46 PS12, Line 46: query-state-re > I recommend to rename this to query-state-record.cc to not confuse with be/ Done http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h@1261 PS13, Line 1261: /// Ticker that wakes up the completed_queried_thread at set intervals to process the : /// queued completed queries. : std::unique_ptr completed_queries_ticker_; : std::shared_ptr completed_queries_ready_; Mention in comment if they are synchronize using completed_queries_lock_. Also move this closer to completed_queries_lock_ declaration at line 1294. http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1256 PS12, Line 1256: std::unique_ptr admission_heartbeat_thr > Is this protected by completed_queries_lock_ as well? Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1281 PS12, Line 1281: : /// Default query options in the form of TQueryOptions and b > Mention constant/flag that represent maximum number of attempts in this doc Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1797 PS12, Line 1797: : : > Is this also immutable like QueryStateRecord? Please clarify in comment. Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@50 PS12, Line 50: : DEFINE_bool(enable_workload_mgmt, false, : "Specifies if Impala will automatically write completed queries in the query log " : "table. If this value is set to true and then later removed, the query log table " : "will remain intact and accessible."); : > Since it only support iceberg table now, shall this turn to bool flag? Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@63 PS12, Line 63: return false; : }); : > Validate against empty string. Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@186 PS12, Line 186: /// table. : static std::list> fields_; : > I agree generally. I think this particular instance is different since it's Ack http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@494 PS12, Line 494: fields_.push_back(MakeTuple("network_address", _clientNetworkAddress)); : fields_.push_back(MakeTuple("start_time_utc", _queryStartTime, "TIMESTAMP")); : fields_.push_back(MakeTuple("total_time_us", _queryDuration, "BIGINT")); : fields_.push_back(MakeTuple("s > Is field order matter here? If yes, please add comment to warn against reor Done http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@646 PS12, Line 646: { : lock_guard l(completed_queries_threadstate_mu_); : completed_queries_thread_state_ = RUNNING; : : completed_queries_ticker_ = make_unique( : std::chrono::seconds(FLAGS_query_log_write_interval_s), completed_queries_cv_, : completed_que > Add indentations and check wmEnabled() first. Done http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc@615 PS13, Line 615: || (query_handle->sql_stmt().length() >= 3 : && boost::algorithm::iequals(query_handle->sql_stmt().substr(0,3), "use")) : || (query_handle->sql_stmt().length() >= 4 : && boost::algorithm::iequals(query_handle->sql_stmt().substr(0,4), "show")) This might not catch query such as: -- intentional comment show column stats customer; Probably they should be disabled right from Frontend (UseStmt.java, SetStmt.java, ShowStmt.java). http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h File be/src/util/network-util.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/network-util.h@114
[Impala-ASF-CR] IMPALA-12540: Add system.impala query live table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20762 ) Change subject: IMPALA-12540: Add system.impala_query_live table .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15074/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743 Gerrit-Change-Number: 20762 Gerrit-PatchSet: 26 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 01:01:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12626: Capture tables in query for log
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20886 ) Change subject: IMPALA-12626: Capture tables in query for log .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15075/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20886 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c9c80b2adf7f3e44225a191fe8eb9df3c4bc5aa Gerrit-Change-Number: 20886 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 01:00:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12626: Capture tables in query for log
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20886 ) Change subject: IMPALA-12626: Capture tables in query for log .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/20886/7/be/src/service/query-state.cc File be/src/service/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20886/7/be/src/service/query-state.cc@182 PS7, Line 182: > We could, but they'd only be sorted in each particular instance. When the c Oops, I was thinking of a different problem. But the list of tables is already sorted in https://gerrit.cloudera.org/c/20886/7/fe/src/main/java/org/apache/impala/service/Frontend.java#2381 as it's iterating over a sorted set. -- To view, visit http://gerrit.cloudera.org:8080/20886 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c9c80b2adf7f3e44225a191fe8eb9df3c4bc5aa Gerrit-Change-Number: 20886 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 00:34:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12426: Refactor QueryStateRecord includes
Michael Smith has abandoned this change. ( http://gerrit.cloudera.org:8080/20942 ) Change subject: IMPALA-12426: Refactor QueryStateRecord includes .. Abandoned Incorporated into parent. -- To view, visit http://gerrit.cloudera.org:8080/20942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I02bc9bdb7a820fa01fbaa55526270f4aade749ee Gerrit-Change-Number: 20942 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12540: Add system.impala query live table
Hello Andrew Sherman, Jason Fehr, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20762 to look at the new patch set (#26). Change subject: IMPALA-12540: Add system.impala_query_live table .. IMPALA-12540: Add system.impala_query_live table Defines SystemTable which are in-memory tables that can provide access to Impala state. Adds the 'impala_query_live' to the database 'sys', which already exists for 'sys.impala_query_log'. Implements the 'impala_query_live' table to view active queries across all coordinators sharing the same statestore. SystemTables create new SystemTableScanNodes for their scan node implementation. When computing scan range locations, SystemTableScanNodes creates a scan range for each in the cluster (identified via ClusterMembershipMgr). This produces a plan that looks like: Query: explain select * from sys.impala_query_live ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 | | Per-Host Resource Estimates: Memory=11MB | | WARNING: The following tables are missing relevant table | | and/or column statistics. | | sys.impala_query_live | || | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED]| | | | | 00:SCAN SYSTEM_TABLE [sys.impala_query_live]| |row-size=72B cardinality=20 | ++ Execution will pull data from ImpalaServer on the backend via a SystemTableScanner implementation based on table name. In the query profile, SYSTEM_TABLE_SCAN_NODE includes ActiveQueryCollectionTime and PendingQueryCollectionTime to track time spent collecting QueryState from ImpalaServer. Grants QueryScanner private access to ImpalaServer, identical to how ImpalaHttpHandler access internal server state. Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc M be/src/exec/scan-node.cc A be/src/exec/system-table-scan-node.cc A be/src/exec/system-table-scan-node.h A be/src/exec/system-table-scanner.cc A be/src/exec/system-table-scanner.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/scheduling/scheduler.cc M be/src/service/fe-support.cc M be/src/service/frontend.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-state-record.h M be/src/service/workload-management.cc M be/src/util/sharded-query-map-util.h M common/thrift/CMakeLists.txt M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/PlanNodes.thrift M common/thrift/StatestoreService.thrift A common/thrift/SystemTables.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java A fe/src/main/java/org/apache/impala/catalog/SystemTable.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/Frontend.java A tests/custom_cluster/test_query_live.py 35 files changed, 1,378 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20762/26 -- To view, visit http://gerrit.cloudera.org:8080/20762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743 Gerrit-Change-Number: 20762 Gerrit-PatchSet: 26 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12626: Capture tables in query for log
Hello Andrew Sherman, Jason Fehr, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20886 to look at the new patch set (#8). Change subject: IMPALA-12626: Capture tables in query for log .. IMPALA-12626: Capture tables in query for log Currently requires 'drop table sys.impala_query_log' to recreate it with the new column. Change-Id: I9c9c80b2adf7f3e44225a191fe8eb9df3c4bc5aa --- M be/src/exec/system-table-scanner.cc M be/src/service/client-request-state.h M be/src/service/query-state-record.cc M be/src/service/query-state-record.h M be/src/service/workload-management.cc M common/thrift/Frontend.thrift M common/thrift/SystemTables.thrift M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_query_live.py M tests/custom_cluster/test_query_log.py 10 files changed, 60 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/20886/8 -- To view, visit http://gerrit.cloudera.org:8080/20886 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c9c80b2adf7f3e44225a191fe8eb9df3c4bc5aa Gerrit-Change-Number: 20886 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 8: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15073/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 8 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 00:19:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#8). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 22 files changed, 380 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/8 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 8 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12426: Query History Table
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc@461 PS13, Line 461: remove_leading_newlines(formatted_plan); Can we remove these when we store it instead? http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util.h File be/src/util/string-util.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util.h@105 PS13, Line 105: void to_snake_case(const std::string& in, std::stringstream* out); nit: ToSnakeCase to match other function names. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util.h@119 PS13, Line 119: void remove_leading_newlines(std::string& str); This is pretty easy and probably more performant as boost::algorithm::trim_left_if(str, boost::algorithm::is_any_of("\n")); -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 23:46:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc@1050 PS7, Line 1050: coord_instance_->GetCodegenRef(results->GetCodegenPtr()); Oh, I think I understand this now. It's assigning the codegen object held by FragmentInstance (via FragmentInstanceState aka coord_instance_) to the QueryResultSet results (via a ref GetCodegenPtr). That's completed inverted from what I'd expect. Why doesn't QueryResultSet have a SetCodegenRef, and the others GetCodegenRef that returns a shared_ptr? -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 26 Jan 2024 23:37:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc@263 PS7, Line 263: return status; Should this also call codegen->reset()? http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h File be/src/runtime/fragment-state.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h@58 PS7, Line 58: void GetCodegenRef(std::shared_ptr& codegen); nit: this makes more sense to me as SetCodegenRef. And most idiomatic version would take std::shared_ptr by value and std::move to codegen_. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h File be/src/service/query-result-set.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@98 PS7, Line 98: std::shared_ptr& GetCodegenPtr() { return codegen; } Safer to return a const&. Don't want a caller to be able to reset it to delete the codegen early. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@109 PS7, Line 109: std::shared_ptr codegen; I don't see this set anywhere. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 26 Jan 2024 23:30:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. IMPALA-12747: Atomic update of execution state QueryDriver owns instances of ClientRequestState and TExecRequest. The ClientRequestState is used to track execution state of the client-facing side of a query. TExecRequest encapsulates context about the query produced by the planner. When a QueryDriver is created, it creates an instance of ClientRequestState, but has not yet executed planning. It would create an empty TExecRequest and pass a pointer to it to ClientRequestState, then update the content of TExecRequest when RunFrontendPlanner is called from ImpalaServer::ExecuteInternal. Updating TExecRequest was not atomic, so it was possible other operations - like producing a QueryStateRecord for /queries in the web UI - would try to read the content of TExecRequest while updating. This caused TSAN errors and occasional crashes in internal-server-test, which runs concurrent requests and examines them through calls to /queries. Changes ClientRequestState to - Provide a static placeholder for TExecRequest during creation that represents an empty context for an UNKNOWN statement type (default initialized in Thrift). - Make all references to TExecRequest const so its content cannot be updated in a non-thread-safe manner. - ClientRequestState uses an AtomicPtr which is updated atomically when the filled TExecRequest is available. QueryDriver does not publicly expose access to TExecRequest, so we can ensure its use is thread-safe without atomics. ClientRequestState::exec_request() will return either a reference to the static placeholder or the value provided after - which is never changed - so this reference will always be valid for the lifetime of the ClientRequestState. Updates user_has_profile_access to be AtomicBool for the same reason. Reverts tsan-suppressions for IMPALA-12660 so we get TSAN coverage. Adds suppression for a lock-order-inversion bug (IMPALA-12757) that was uncovered after fixing this data race. Testing: - InternalServerTest.SimultaneousMultipleQueriesOneSession would fail after ~10 test runs. Ran 90 times without failure. - Passed TSAN run of backend tests. Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Reviewed-on: http://gerrit.cloudera.org:8080/20956 Reviewed-by: Jason Fehr Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins --- M be/src/runtime/query-driver.cc M be/src/runtime/query-driver.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M bin/tsan-suppressions.txt M common/thrift/Frontend.thrift M common/thrift/Types.thrift 9 files changed, 209 insertions(+), 185 deletions(-) Approvals: Jason Fehr: Looks good to me, but someone else must approve Riza Suminto: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12426: Query History Table
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: (9 comments) http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h@1653 PS13, Line 1653: nit: unnecessary added whitespace http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h File be/src/service/query-state-record.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@183 PS13, Line 183: class EventsTimelineIterator { Missing code comments outlining the purpose of this class, and what each function does. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@192 PS13, Line 192: EventsTimelineIterator(const std::vector* labels, These would make more sense as const&. Passing nullptr is invalid, and we're not mutating them, which are the two use-cases for passing by pointer. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@206 PS13, Line 206: iter_t operator*() const; This class is serving two purposes that might be better served as two separate classes. It's basically https://en.cppreference.com/w/cpp/ranges/zip_view (or boost::combine) with a custom iterator implemented for the events timeline. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@287 PS13, Line 287: EventsTimelineIterator CEventsTimeline() const; nit: it looks strange to prefix things with C just to say they're const. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc File be/src/service/query-state-record.cc: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc@182 PS13, Line 182: admission_control_time_since_last_update = adm_ctrl_ptr == NULL ? 0 : This and compute_scan_range_assignment are only initialized if summary_profile is non-null. This can be pretty confusing. https://en.cppreference.com/w/cpp/language/value_initialization goes into more detail than I can. But it's safer to give primitive types an explicit default value. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc@252 PS13, Line 252: return EventsTimelineIterator(labels_, timestamps_, labels_->cbegin(), nit: you don't need to use cbegin/cend on const objects. See https://en.cppreference.com/w/cpp/container/vector/begin for the definitions. It should be rare that you need to use the cbegin/cend versions; it has a pretty specific use-case, essentially https://stackoverflow.com/a/12001519. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/CMakeLists.txt File be/src/util/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/CMakeLists.txt@272 PS13, Line 272: ADD_UNIFIED_BE_LSAN_TEST(string-util-test "TruncateDownTest.*:TruncateUpTest.*:CommaSeparatedContainsTest.*:FindUtf8PosForwardTest.*:FindUtf8PosBackwardTest.*:RandomFindUtf8PosTest.*:ToSnakeCaseTest.*:RemoveLeadingNewlinesTest.*:StreamSteamPopTest.*") typo: StringStreamPopTest http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util-test.cc File be/src/util/string-util-test.cc: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util-test.cc@319 PS13, Line 319: TEST(StreamSteamPopTest, NotEmptyPopOnce) { typo: StringStreamPopTest -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 23:12:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15072/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 26 Jan 2024 23:15:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 23:15:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#7). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 20 files changed, 359 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/7 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 6: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/15071/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 6 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 26 Jan 2024 22:57:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12426: Query History Table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15070/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:47:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20850/6/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20850/6/be/src/codegen/llvm-codegen.cc@267 PS6, Line 267: MemTracker* parent_mem_tracker, const string& id, std::shared_ptr* codegen) { line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 6 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 26 Jan 2024 22:37:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#6). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 20 files changed, 358 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/6 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 6 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12426: Query History Table
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/util/runtime-profile.h@257 PS12, Line 257: CounterMap counter_map_; > How about updating the implementation of GetCounter to use const accessors Done http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util.cc File be/src/util/string-util.cc: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util.cc@198 PS13, Line 198: if (!str().empty()) { str() can be expensive, it makes a copy of the underlying string. Use tellp() > 0 instead. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/uid-util.h File be/src/util/uid-util.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/uid-util.h@134 PS13, Line 134: inline bool is_uuid_empty(const TUniqueId& id) { Seems like this should be IsUUIDEmpty to match other functions in this header. -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:35:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12426: Query History Table
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1861 PS12, Line 1861: > You'd need to std::move the value into place in the constructor. Done http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h File be/src/service/query-state-record.h: http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@161 PS13, Line 161: private: nit: private should be indented differently than the other lines. Previously it had a 1-space indent. http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@212 PS13, Line 212: EventsTimelineIterator cbegin(); If you called these begin/end, I think you could use this iterator as part of range-based for loops. https://en.cppreference.com/w/cpp/language/range-for describes the constraints. -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:22:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12426: Query History Table
Jason Fehr has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/20770 ) Change subject: IMPALA-12426: Query History Table .. IMPALA-12426: Query History Table Adds the ability for users to specify that Impala will create and maintain an internal Iceberg table that contains data about all completed queries. This table is automatically created at startup by each coordinator if it does not exist. Then, most completed queries are queued in memory and flushed to the query history table at a set interval (either minutes or number of records). Set, use, and show queries are not written to this table. This commit leverages the InternalServer class to maintain the query history table. Ctest unit tests have been added to assert the various pieces of code. New custom cluster tests have been added to assert the query history table is properly populated with completed queries. Negative testing consists of attempting sql injection attacks and syntactically incorrect queries. Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/query-driver.cc M be/src/runtime/query-driver.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/CMakeLists.txt M be/src/service/client-request-state.h M be/src/service/impala-http-handler.cc M be/src/service/impala-http-handler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/internal-server-test.cc M be/src/service/internal-server.cc M be/src/service/internal-server.h A be/src/service/query-state-record.cc A be/src/service/query-state-record.h A be/src/service/workload-management.cc M be/src/util/CMakeLists.txt M be/src/util/backend-gflag-util.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h A be/src/util/network-util-test.cc M be/src/util/network-util.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/string-util-test.cc M be/src/util/string-util.cc M be/src/util/string-util.h A be/src/util/ticker.h M be/src/util/uid-util-test.cc M be/src/util/uid-util.h M bin/run_clang_tidy.sh M common/function-registry/impala_functions.py M common/thrift/BackendGflags.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java M tests/beeswax/impala_beeswax.py M tests/common/custom_cluster_test_suite.py A tests/custom_cluster/test_query_log.py A tests/util/assert_time.py A tests/util/memory.py A tests/util/retry.py 47 files changed, 3,350 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/20770/13 -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 13 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20956/7/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/20956/7/be/src/service/client-request-state.cc@1998 PS7, Line 1998: case TStmtType::UNKNOWN: This was necessary to avoid audit logging queries that hadn't started. Previously they defaulted to QUERY, which also hit this branch. -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:18:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:16:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 22:10:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 22:04:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12381: Set JDBC related properties in JDBC data source
gaurav singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/20941 ) Change subject: IMPALA-12381: Set JDBC related properties in JDBC data source .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0a0b5d9d7b06825842828c3722c2bcdb4ea8 Gerrit-Change-Number: 20941 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 21:09:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15069/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 21:04:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:53:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453 PS9, Line 2453: catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri(); > nit: +2 indentation Done -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:36:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20533 to look at the new patch set (#10). Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 2 files changed, 383 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/10 -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20533/9/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2453 PS9, Line 2453: catalog_.getDb("database_to_be_dropped").getMetaStoreDb().getLocationUri(); nit: +2 indentation -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 20:23:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15068/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 9 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 20:03:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20953 ) Change subject: IMPALA-12695: Crash with UNION with complex types .. Patch Set 1: (1 comment) The code looks good to me (or at least I have no idea how to do it better :) ). The tests could be extended for zipping unnest. http://gerrit.cloudera.org:8080/#/c/20953/1/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/20953/1/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@156 PS1, Line 156: item Can you also add tests with unnest() in select list /from clause,e.g in zipping-unnest-in-from-clause.test / zipping-unnest-in-select-list.test Quickly checked with v as (select arr1 from complextypes_arrays union all select arr1 from complextypes_arrays) , v2 as (select unnest(arr1) a from v) select * from v2 where a=1; it returns a not too friendly error message: ERROR: IllegalStateException: Failed analysis after expr substitution. CAUSED BY: ClassCastException: class org.apache.impala.analysis.NumericLiteral cannot be cast to class org.apache.impala.analysis.SlotRef (org.apache.impala.analysis.NumericLiteral and org.apache.impala.analysis.SlotRef are in unnamed module of loader 'app') another one crashes Impala: with v as (select arr1, arr2 from complextypes_arrays union all select arr1, arr2 from complextypes_arrays) , v2 as (select unnest(arr1) a1, unnest(arr2) a2 from v) select * from v2 where a2="one" -- To view, visit http://gerrit.cloudera.org:8080/20953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46 Gerrit-Change-Number: 20953 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Fri, 26 Jan 2024 19:40:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
gaurav singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh File testdata/bin/load-ext-data-sources.sh: http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh@84 PS8, Line 84: alltypes_with_date > How about rename this table as alltypes_with_date? This is not jdbc data so yes makes sense. http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh@105 PS8, Line 105: > remove this tmp file in the end of script Done -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 9 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 19:37:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Hello Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20915 to look at the new patch set (#9). Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. IMPALA-12503: Support date data type for predicates for external data source table This patch adds support for datatype date as predicates for external data sources. Testing: - Manually tested binary date predicates with operators: '=', '>', '<', 'BETWEEN'. - Modified the existing schema and add test cases for date predicate for postgres, mysql and impala databases. alltypes_impala and AllTypesCaseSensitiveNames_impala Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java M testdata/bin/create-ext-data-source-table.sql M testdata/bin/load-ext-data-sources.sh M testdata/bin/setup-mysql-env.sh M testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test M testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test M testdata/workloads/functional-query/queries/QueryTest/mysql-ext-jdbc-tables.test 11 files changed, 218 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/20915/9 -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 9 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10199/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 18:44:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15067/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 8 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 18:43:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 8: (2 comments) Please add more test cases for predicates with date type http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh File testdata/bin/load-ext-data-sources.sh: http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh@84 PS8, Line 84: alltypes_jdbc_impala_datasource How about rename this table as alltypes_with_date? This is not jdbc data source table since there is no "produced by ..." in the statement. http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh@105 PS8, Line 105: /tmp/impala_jdbc_alltypes.sql remove this tmp file in the end of script -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 8 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 18:35:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 18:30:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
gaurav singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/bin/setup-impala-env.sh File testdata/bin/setup-impala-env.sh: http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/bin/setup-impala-env.sh@44 PS7, Line 44: VARCHAR(10) > change to STRING Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/bin/setup-impala-env.sh@51 PS7, Line 51: DROP TABLE IF EXISTS AllTypesCaseSensitiveNames_impala; : CREATE TABLE AllTypesCaseSensitiveNames_impala > Impala is case insensitive. Don't need to create this table Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/bin/setup-impala-env.sh@63 PS7, Line 63: VARCHAR(10) > change to STRING Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/bin/setup-impala-env.sh@71 PS7, Line 71: # Load data to jdbc table : cat ${IMPALA_HOME}/testdata/target/AllTypes/* > /tmp/impala_jdbc_alltypes.csv : : hadoop fs -mkdir -p hdfs:///tmp/ : hadoop fs -put /tmp/impala_jdbc_alltypes.csv \ : hdfs:///tmp/impala_jdbc_alltypes >/dev/null 2>&1 : : impala-shell -i localhost -q "use functional;load data inpath 'hdfs:///tmp/\ : impala_jdbc_alltypes' into table alltypes_impala;" : : hadoop fs -put /tmp/impala_jdbc_alltypes.csv hdfs:///tmp/impala_jdbc_alltypes >\ : /dev/null 2>&1 : : impala-shell -i localhost -q "use functional;load data inpath 'hdfs:///tmp/\ : impala_jdbc_alltypes' into table AllTypesCaseSensitiveNames_impala;" > load data from existing table with 'insert select' as Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test File testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test: http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test@74 PS7, Line 74: AllTypesCaseSensitiveNames_impala > still use alltypes_impala since we don't need to create AllTypesCaseSensiti Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test@83 PS7, Line 83: where float_col = 0 > it was fixed after changing the way to load data from existing table with ' Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test@226 PS7, Line 226: limit 5 > change to 'order by id limit 5' to get result consistently Done http://gerrit.cloudera.org:8080/#/c/20915/7/testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test@240 PS7, Line 240: limit 5 > change to 'order by id limit 5' to get result consistently Done -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 7 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 18:18:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Hello Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20915 to look at the new patch set (#8). Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. IMPALA-12503: Support date data type for predicates for external data source table This patch adds support for datatype date as predicates for external data sources. Testing: - Manually tested binary date predicates with operators: '=', '>', '<', 'BETWEEN'. - Modified the existing schema and test cases for postgres and mysql databases. - Modified the impala-to-impala test by adding a new tables alltypes_impala and AllTypesCaseSensitiveNames_impala Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java M java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java M testdata/bin/create-ext-data-source-table.sql M testdata/bin/load-ext-data-sources.sh M testdata/bin/setup-mysql-env.sh M testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test M testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test M testdata/workloads/functional-query/queries/QueryTest/mysql-ext-jdbc-tables.test M tests/custom_cluster/test_ext_data_sources.py 12 files changed, 218 insertions(+), 83 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/20915/8 -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 8 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh File testdata/bin/load-ext-data-sources.sh: http://gerrit.cloudera.org:8080/#/c/20915/8/testdata/bin/load-ext-data-sources.sh@100 PS8, Line 100: SELECT id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4 Gerrit-Change-Number: 20915 Gerrit-PatchSet: 8 Gerrit-Owner: gaurav singh Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 26 Jan 2024 18:19:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Hello Quanlong Huang, Riza Suminto, Jason Fehr, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20956 to look at the new patch set (#7). Change subject: IMPALA-12747: Atomic update of execution state .. IMPALA-12747: Atomic update of execution state QueryDriver owns instances of ClientRequestState and TExecRequest. The ClientRequestState is used to track execution state of the client-facing side of a query. TExecRequest encapsulates context about the query produced by the planner. When a QueryDriver is created, it creates an instance of ClientRequestState, but has not yet executed planning. It would create an empty TExecRequest and pass a pointer to it to ClientRequestState, then update the content of TExecRequest when RunFrontendPlanner is called from ImpalaServer::ExecuteInternal. Updating TExecRequest was not atomic, so it was possible other operations - like producing a QueryStateRecord for /queries in the web UI - would try to read the content of TExecRequest while updating. This caused TSAN errors and occasional crashes in internal-server-test, which runs concurrent requests and examines them through calls to /queries. Changes ClientRequestState to - Provide a static placeholder for TExecRequest during creation that represents an empty context for an UNKNOWN statement type (default initialized in Thrift). - Make all references to TExecRequest const so its content cannot be updated in a non-thread-safe manner. - ClientRequestState uses an AtomicPtr which is updated atomically when the filled TExecRequest is available. QueryDriver does not publicly expose access to TExecRequest, so we can ensure its use is thread-safe without atomics. ClientRequestState::exec_request() will return either a reference to the static placeholder or the value provided after - which is never changed - so this reference will always be valid for the lifetime of the ClientRequestState. Updates user_has_profile_access to be AtomicBool for the same reason. Reverts tsan-suppressions for IMPALA-12660 so we get TSAN coverage. Adds suppression for a lock-order-inversion bug (IMPALA-12757) that was uncovered after fixing this data race. Testing: - InternalServerTest.SimultaneousMultipleQueriesOneSession would fail after ~10 test runs. Ran 90 times without failure. - Passed TSAN run of backend tests. Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f --- M be/src/runtime/query-driver.cc M be/src/runtime/query-driver.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M bin/tsan-suppressions.txt M common/thrift/Frontend.thrift M common/thrift/Types.thrift 9 files changed, 209 insertions(+), 185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/20956/7 -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table
gaurav singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/20915 ) Change subject: IMPALA-12503: Support date data type for predicates for external data source table .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@66 PS6, Line 66: long > define as 'static final long' type for this constant. Also move this line a Done http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@214 PS6, Line 214: mm > nit: -MM-dd to keep consistent with code in the function Done http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@220 PS6, Line 220: > nit: two more indent space Done http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java: http://gerrit.cloudera.org:8080/#/c/20915/6/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@28 PS6, Line 28: import org.apache.impala.extdatasource.jdbc.dao.DatabaseAccessor; > nit: keep alphabet order Done http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/clean-impala-env.sh File testdata/bin/clean-impala-env.sh: http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/clean-impala-env.sh@1 PS6, Line 1: #!/bin/bash > Don't need this script to clean up if we combine setup-impala-env.sh with t Done http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/setup-impala-env.sh File testdata/bin/setup-impala-env.sh: http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/setup-impala-env.sh@29 PS6, Line 29: # Create local impala tables > You can combine this script with testdata/bin/load-ext-data-sources.sh, whi Done http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/setup-impala-env.sh@32 PS6, Line 32: alltypes_impala > name the table as alltypes_jdbc_impala_datasource Done http://gerrit.cloudera.org:8080/#/c/20915/6/testdata/bin/setup-impala-env.sh@49 PS6, Line 49: cat > /tmp/impala_jdbc_alltypes_with_case_sensitive_names.sql <<__EOT__ : USE FUNCTIONAL; : DROP TABLE IF EXISTS AllTypesCaseSensitiveNames_impala; : CREATE TABLE AllTypesCaseSensitiveNames_impala : ( : id INT, : Bool_colBOOLEAN, : Tinyint_col SMALLINT, : Smallint_colSMALLINT, : Int_col INT, : Bigint_col BIGINT, : Float_col FLOAT, : Double_col DOUBLE, : Date_colDATE, : String_col VARCHAR(10), : Timestamp_col TIMESTAMP : ); : __EOT_ > Impala is case insensitive. All database names and table names are converte Done http://gerrit.cloudera.org:8080/#/c/20915/6/tests/custom_cluster/test_ext_data_sources.py File tests/custom_cluster/test_ext_data_sources.py: http://gerrit.cloudera.org:8080/#/c/20915/6/tests/custom_cluster/test_ext_data_sources.py@209 PS6, Line 209: # create alltypes_impala and load data into it. : script = os.path.join(os.environ['IMPALA_HOME'], 'testdata/bin/setup-impala-env.sh') : run_cmd = [script] : try: : subprocess.check_call(run_cmd, close_fds=True) : except subprocess.CalledProcessError: : assert False, "Failed to setup impala env" : : @classmethod : def _remove_impala_test_env(cls): : # drop table alltypes_impala : script = os.path.join(os.environ['IMPALA_HOME'], 'testdata/bin/clean-impala-env.sh') : run_cmd = [script] : subprocess.check_call(run_cmd, close_fds=True) : try: : subprocess.check_call(run_cmd, close_fds=True) : except subprocess.CalledProcessError: : assert False, "Failed to clean impala env" > Don't need this functions if we combine setup-impala-env.sh with testdata/b Done -- To view, visit http://gerrit.cloudera.org:8080/20915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 6: (2 comments) > Patch Set 6: > > (2 comments) > > > Patch Set 2: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10198/ > > It seems there are issues on redaction: > > custom_cluster/test_redaction.py:298: in test_redacted > self.assert_log_redaction(credit_card, "\*credit card\*", > expect_audit=False) > custom_cluster/test_redaction.py:133: in assert_log_redaction > assert_no_files_in_dir_contain(self.audit_dir, unredacted_value) > common/file_utils.py:178: in assert_no_files_in_dir_contain > % (dir, search) > E AssertionError: > /home/ubuntu/Impala/logs/custom_cluster_tests/test_redaction_fvRZ3Z/audits > should not have any file containing '1234-5678-1234-5678' but a file was found Test failure seems to be caused by changing the default TStmtType value to UNKNOWN. Working on tracking that down. http://gerrit.cloudera.org:8080/#/c/20956/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20956/6//COMMIT_MSG@9 PS6, Line 9: QueryStateRecord > "QueryDriver" ? Done http://gerrit.cloudera.org:8080/#/c/20956/6/be/src/runtime/query-driver.cc File be/src/runtime/query-driver.cc: http://gerrit.cloudera.org:8080/#/c/20956/6/be/src/runtime/query-driver.cc@119 PS6, Line 119: RETURN_IF_ERROR(DoFrontendPlanning(query_ctx, false)); > Why we need to do planning here? Isn't the 'external_exec_request' already Done -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 17:47:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 15:28:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 2: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc@91 PS1, Line 91: if (IsTimestamp(scalar_type) || IsDateTime(scalar_type)) { > I don't think we need these DCHECKs here since you give an error from the b Done http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@21 PS1, Line 21: import org.apache.impala.catalog.ScalarType; > Is this used? Done http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@562 PS1, Line 562: transformParam), > nit: For me it was a bit misleading that this param got into the same line Done -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 15:02:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Hello Gabor Kaszab, Noemi Pap-Takacs, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20954 to look at the new patch set (#2). Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error Iceberg tables can be identity partitioned by any type, e.g. int, date and even float. If a table is partitioned, the file path contains the partition value in human readable form, and this form is expected to be passed to CatalogD. When an UPDATE or DELETE command is executed, we don't transform the integer date value to human readable format, which causes errors in CatalogD. With this patch, we transform identity-partitioned date values to human-readable format. Note on floating point numbers: When users partition their data via floating point values (users should not do that), then the file paths created for delete files might not correspond to the data files (e.g. '1.1' vs '1.10023841858'). Though the values are the same in the Iceberg metadata layer, so it doesn't cause correctness issues. Testing: * added e2e tests for DELETEs * added e2e tests for UPDATEs Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 --- M be/src/exec/iceberg-delete-sink-base.cc M be/src/exec/iceberg-delete-sink-base.h M be/src/exec/table-sink-base.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/test/java/org/apache/impala/util/IcebergUtilTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-partitioned.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test 15 files changed, 160 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/20954/2 -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 1: Code-Review+1 (3 comments) Thanks for the quick fix, Zoltan! In general the patch seems fine, I only have some nits. http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc@91 PS1, Line 91: DCHECK(!IsTimestamp(scalar_type)); I don't think we need these DCHECKs here since you give an error from the below if anyway. http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@21 PS1, Line 21: import org.apache.impala.catalog.PrimitiveType; Is this used? http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@562 PS1, Line 562: transformParam), Type.fromTScalarType(field.getType(; nit: For me it was a bit misleading that this param got into the same line as the end of IcebergPartitionTransform constructor call. Maybe in a new line it would express better for the reader that it belongs to the IcebergPartitionField creation. -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Fri, 26 Jan 2024 11:37:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables .. Patch Set 3: (6 comments) Hi Gabor, thank you for the feature, it looks great. Marked some nits and I have a question, what was the reason behind moving from Thrift to FlatBuffer? http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@7 PS3, Line 7: d nit: d http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@9 PS3, Line 9: has nit: have It refers to the plural object. http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs@53 PS3, Line 53: equality_field_ids I saw this in multiple places, the new term became equality_fiel_ids, in many places it was left as equality_ids. I will mark some and I think we should use one of them to avoid confusion and to be consistent. Iceberg uses equality_ids to refer to equality field ids. https://iceberg.apache.org/spec/ http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@126 PS3, Line 126: private Set allEqualityFieldIds_ = new HashSet<>(); : private Map, Set> equalityIdsToDeleteFiles_ = : new HashMap<>(); nit: Here we use both equality ids and equality field ids. http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@510 PS3, Line 510: d nit: D http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test: http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test@114 PS3, Line 114: filed nit: field -- To view, visit http://gerrit.cloudera.org:8080/20951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 Gerrit-Change-Number: 20951 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 10:33:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12747: Atomic update of execution state
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20956 ) Change subject: IMPALA-12747: Atomic update of execution state .. Patch Set 6: (2 comments) > Patch Set 2: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10198/ It seems there are issues on redaction: custom_cluster/test_redaction.py:298: in test_redacted self.assert_log_redaction(credit_card, "\*credit card\*", expect_audit=False) custom_cluster/test_redaction.py:133: in assert_log_redaction assert_no_files_in_dir_contain(self.audit_dir, unredacted_value) common/file_utils.py:178: in assert_no_files_in_dir_contain % (dir, search) E AssertionError: /home/ubuntu/Impala/logs/custom_cluster_tests/test_redaction_fvRZ3Z/audits should not have any file containing '1234-5678-1234-5678' but a file was found http://gerrit.cloudera.org:8080/#/c/20956/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20956/6//COMMIT_MSG@9 PS6, Line 9: QueryStateRecord "QueryDriver" ? http://gerrit.cloudera.org:8080/#/c/20956/6/be/src/runtime/query-driver.cc File be/src/runtime/query-driver.cc: http://gerrit.cloudera.org:8080/#/c/20956/6/be/src/runtime/query-driver.cc@119 PS6, Line 119: RETURN_IF_ERROR(DoFrontendPlanning(query_ctx, false)); Why we need to do planning here? Isn't the 'external_exec_request' already the TExecRequest to use? EDIT: Just realized it's an existing behavior. Can we comment it somewhere that we want one from the Impala planner to compare with? -- To view, visit http://gerrit.cloudera.org:8080/20956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a967c5c84b6a401f8f5764373f6cd7ee807545f Gerrit-Change-Number: 20956 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 26 Jan 2024 08:46:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20957 ) Change subject: IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15063/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23995af4869e7ab8d826a1dd4d1197a21e738169 Gerrit-Change-Number: 20957 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 26 Jan 2024 08:10:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20957 ) Change subject: IMPALA-12699: Set recv timeout for GetPartialCatalogObject Thrift RPC .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15064/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23995af4869e7ab8d826a1dd4d1197a21e738169 Gerrit-Change-Number: 20957 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 26 Jan 2024 08:09:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12448: Avoid getting stuck when refreshing a non-existent partition
ttz has posted comments on this change. ( http://gerrit.cloudera.org:8080/20490 ) Change subject: IMPALA-12448: Avoid getting stuck when refreshing a non-existent partition .. Patch Set 23: > Patch Set 23: > > (10 comments) > > Sorry to be late on this. It took me some time to revisit the details. Hi, Quanlong! Thank you for your reply. I will handle it in the next couple of days. -- To view, visit http://gerrit.cloudera.org:8080/20490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iace7cdadda300b03896f92415822266354421887 Gerrit-Change-Number: 20490 Gerrit-PatchSet: 23 Gerrit-Owner: ttz <2433038...@qq.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: ttz <2433038...@qq.com> Gerrit-Comment-Date: Fri, 26 Jan 2024 07:56:37 + Gerrit-HasComments: No