[kudu-CR] [metrics] Add tablet level metrics for scans op time
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21291 ) Change subject: [metrics] Add tablet level metrics for scans op time .. [metrics] Add tablet level metrics for scans op time We currently have monitoring in place for scan data volume and slow scans, but we are still lacking monitoring data for scan request timings. In this patch, I have added monitoring for scan request timings at the tablet level to assist us in pinpointing specific scenarios of high CPU usage during scanning operations. Change-Id: I8f490cfb6f37aef60b34697100fb502374fcc503 Reviewed-on: http://gerrit.cloudera.org:8080/21291 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/util/stopwatch.h 6 files changed, 237 insertions(+), 45 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8f490cfb6f37aef60b34697100fb502374fcc503 Gerrit-Change-Number: 21291 Gerrit-PatchSet: 7 Gerrit-Owner: KeDeng Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [metrics] Add tablet level metrics for scans op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21291 ) Change subject: [metrics] Add tablet level metrics for scans op time .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f490cfb6f37aef60b34697100fb502374fcc503 Gerrit-Change-Number: 21291 Gerrit-PatchSet: 6 Gerrit-Owner: KeDeng Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 23 May 2024 19:30:11 + Gerrit-HasComments: No
[kudu-CR] [metrics] Add tablet level metrics for scans op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21291 ) Change subject: [metrics] Add tablet level metrics for scans op time .. Patch Set 5: Code-Review+1 (12 comments) Almost there, just a style and wording nits. Thank you for revving the patch! http://gerrit.cloudera.org:8080/#/c/21291/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21291/5//COMMIT_MSG@10 PS5, Line 10: scan time nit: scan request timings That's less ambiguous since 'scan time' might be confused with the total time spent to fetch all the data from all the involved tablet servers based on the scan's spec. >From what I could see, this patch adds histogram metrics to track timings of >individual scan requests (both new and continuation ones). http://gerrit.cloudera.org:8080/#/c/21291/5//COMMIT_MSG@11 PS5, Line 11: scan time nit: scan request timings http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.h File src/kudu/tablet/tablet_metrics.h: http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.h@68 PS5, Line 68: scoped_refptr tablet_scans_duration_wall_time; : scoped_refptr tablet_scans_duration_system_time; : scoped_refptr tablet_scans_duration_user_time; These metrics are already in the tablet context, similar to all the other fields in this TabletMetrics struct. With that, maybe rename these to match the others: scan_duration_system_time scan_duration_user_time scan_duration_wall_time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc File src/kudu/tablet/tablet_metrics.cc: http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@157 PS5, Line 157: Tablet Scans Duration In Wall Time Scan Requests Wall Time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@159 PS5, Line 159: Wall time spent scanners scans Duration of scan requests, wall time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@164 PS5, Line 164: Tablet Scans Duration In System Time Scan Requests System Time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@166 PS5, Line 166: System time spent scanners scans. Duration of scan requests, system time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@171 PS5, Line 171: Tablet Scans Duration In User Time Scan Requests User Time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tablet/tablet_metrics.cc@173 PS5, Line 173: User time spent scanners scans Duration of scan requests, user time http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tserver/scanners.h File src/kudu/tserver/scanners.h: http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tserver/scanners.h@410 PS5, Line 410: Add the timings in 'elapsed' to the metrics of tablet. Store the scan request's timings in the tablet metrics. http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tserver/scanners.h@411 PS5, Line 411: update_tablet_metrics style nit: since this isn't an accessor/getter, this should be named like UpdateTabletMetrics() http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tserver/scanners.cc File src/kudu/tserver/scanners.cc: http://gerrit.cloudera.org:8080/#/c/21291/5/src/kudu/tserver/scanners.cc@529 PS5, Line 529: The time spent on this scanner scan nit: Store scan request's timings -- To view, visit http://gerrit.cloudera.org:8080/21291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f490cfb6f37aef60b34697100fb502374fcc503 Gerrit-Change-Number: 21291 Gerrit-PatchSet: 5 Gerrit-Owner: KeDeng Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 23 May 2024 03:49:47 + Gerrit-HasComments: Yes
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21448 ) Change subject: Fix row_project codegen params noalias overflow .. Fix row_project codegen params noalias overflow function->addParamAttr is 0-based indexes, current row_project generator IR code is: `define i1 @ProjRead(i8* %src, %"class.kudu::RowBlockRow"* noalias %rbrow, %"class.kudu::Arena"* noalias %arena)` not same as before. Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Reviewed-on: http://gerrit.cloudera.org:8080/19952 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin (cherry picked from commit c485c8c3cf4b76d8a55e2ec98e27803341285f75) Reviewed-on: http://gerrit.cloudera.org:8080/21448 Tested-by: Alexey Serbin --- M src/kudu/codegen/row_projector.cc 1 file changed, 1 insertion(+), 2 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: merged Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21448 ) Change subject: Fix row_project codegen params noalias overflow .. Patch Set 2: > By the way, I think kudu codegen sometimes not flexible, it need > compiler-clang version and llvm-lib version match.Otherwise > precompiled IR can not be complied by llvm-lib due to poor IR-code > compatibility. > Do we consider use a real JIT to generate code? This will avoid > mismatch between clang version and llvm-lib version. AFAIK, to build its codegen components, Kudu always uses LLVM and CLANG from Kudu's thirdparty, and no external components are used for that. With that, there isn't a problem with version mismatch, IIUC. However, there is an issue with thirdparty's CLANG automatically picking up the very latest version of GCC toolchain (the STL library, etc.) if it's available at the build machine -- it might lead to ABI incompatibility and crashes if the build machine has multiple versions of GCC toolchain installed. The issue is tracked by KUDU-3545. It seems that might be fixed once upgrading Kudu's thirdparty to LLVM/CLANG 16.0.0 or newer and start using the --gcc-install-dir flag to have more control on the GCC toolchain used for Kudu's codegen. As for utilizing more JIT components/features, so far I don't know about currently ongoing or planned projects in this area. Feel free to file a JIRA with detailed description to provide more context on that and share your ideas. Thank you! -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 23 May 2024 02:58:32 + Gerrit-HasComments: No
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has removed a vote on this change. Change subject: Fix row_project codegen params noalias overflow .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: deleteVote Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21448 ) Change subject: Fix row_project codegen params noalias overflow .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 23 May 2024 02:35:07 + Gerrit-HasComments: No
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21448 ) Change subject: Fix row_project codegen params noalias overflow .. Patch Set 2: Verified+1 unrelated test failures -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 23 May 2024 02:35:01 + Gerrit-HasComments: No
[kudu-CR] Add a benchmark for CBTree concurrent writes.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21447 ) Change subject: Add a benchmark for CBTree concurrent writes. .. Patch Set 1: Code-Review+1 (15 comments) http://gerrit.cloudera.org:8080/#/c/21447/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21447/1//COMMIT_MSG@32 PS1, Line 32: writing nit: writer ? Otherwise, maybe re-phrase it to be "Reading threads ... writing threads ..." http://gerrit.cloudera.org:8080/#/c/21447/1//COMMIT_MSG@37 PS1, Line 37: eachi nit: each http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc File src/kudu/tablet/cbtree-test.cc: http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@31 PS1, Line 31: #include "kudu/gutil/stringprintf.h" It seems you'll need to rebase this patch on top the HEAD revision in the master branch of the repo to get '#include "kudu/gutil/atomicops.h"' that IWYU wants to see here. http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@907 PS1, Line 907: snprintf(kbuf, 64, "key_%d", i); nit: consider ether converting this into a functor visible only in the scope where the first two arguments are defined (and maybe even using sizeof(...) instead of hard-coded constant, if possible) or change the signature to supply information on the size of the buffers. Alternatively, use std::array for the first two parameters. http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@924 PS1, Line 924: constexpr int trials = 10; style nit here and below: use kCamelName pattern for static/constexpr constants http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@925 PS1, Line 925: constexpr int num_writer_threads = 9; : constexpr int num_reader_threads = 10; Do you want these to be hard-coded, or there might be some extra value in defining these via command-line flags (with default values as they are set now), so one could try to see how CBTree behaves with customized settings? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@928 PS1, Line 928: esy ? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@931 PS1, Line 931: ≡ nit: does LINT and 'git check' approve using this symbol? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@938 PS1, Line 938: barrier nit: barriers? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@947 PS1, Line 947: variables nit: values http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@958 PS1, Line 958: for (;;) style nit for here and below: is it possible to use 'while (true)' to match the rest of the code in this file and in the Kudu project overall? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@1024 PS1, Line 1024: Threads nit: threads http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@1028 PS1, Line 1028: LOG_TIMING(INFO, Does it make sense to check for failures before starting the second phase of the scenario? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@1029 PS1, Line 1029: $2 nit: $2 values ? http://gerrit.cloudera.org:8080/#/c/21447/1/src/kudu/tablet/cbtree-test.cc@1029 PS1, Line 1029: Threads nit: threads -- To view, visit http://gerrit.cloudera.org:8080/21447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b0b16e269c70716962fc5ebb4ddca1e2cbe68a4 Gerrit-Change-Number: 21447 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 22 May 2024 23:26:13 + Gerrit-HasComments: Yes
[kudu-CR](branch-1.17.x) Fix deadlock on fail for CBTree-test
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21451 ) Change subject: Fix deadlock on fail for CBTree-test .. Fix deadlock on fail for CBTree-test When TestConcurrentIterateAndInsert, TestConcurrentInsert, TestRacyConcurrentInsert fail while --gtest_repeat is used, they will keep running forever. Instead of just returning on fail, they should properly stop the other threads running, and then exit. To reproduce the problem, run this on ARM (where the test actually fails): ./bin/cbtree-test --gtest_repeat=100 --gtest_filter=*Racy* Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Reviewed-on: http://gerrit.cloudera.org:8080/21446 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin (cherry picked from commit d1bc5b53ef006ac14bca4417c4745c5875d89734) Reviewed-on: http://gerrit.cloudera.org:8080/21451 Reviewed-by: Abhishek Chennaka --- M src/kudu/tablet/cbtree-test.cc 1 file changed, 4 insertions(+), 3 deletions(-) Approvals: Abhishek Chennaka: Looks good to me, approved Alexey Serbin: Verified -- To view, visit http://gerrit.cloudera.org:8080/21451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: merged Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21451 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka
[kudu-CR](branch-1.17.x) Fix deadlock on fail for CBTree-test
Alexey Serbin has removed a vote on this change. Change subject: Fix deadlock on fail for CBTree-test .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21451 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka
[kudu-CR](branch-1.17.x) Fix deadlock on fail for CBTree-test
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21451 ) Change subject: Fix deadlock on fail for CBTree-test .. Patch Set 1: Verified+1 unrelated test failures due to: * somewhat polluted thirdparty workspace (it's been built for the main branch, not for branch-1.17.x) * known flakiness mostly related to Ranger's warts and farts -- To view, visit http://gerrit.cloudera.org:8080/21451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21451 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Wed, 22 May 2024 21:12:05 + Gerrit-HasComments: No
[kudu-CR](branch-1.17.x) Fix deadlock on fail for CBTree-test
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21451 Change subject: Fix deadlock on fail for CBTree-test .. Fix deadlock on fail for CBTree-test When TestConcurrentIterateAndInsert, TestConcurrentInsert, TestRacyConcurrentInsert fail while --gtest_repeat is used, they will keep running forever. Instead of just returning on fail, they should properly stop the other threads running, and then exit. To reproduce the problem, run this on ARM (where the test actually fails): ./bin/cbtree-test --gtest_repeat=100 --gtest_filter=*Racy* Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Reviewed-on: http://gerrit.cloudera.org:8080/21446 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin (cherry picked from commit d1bc5b53ef006ac14bca4417c4745c5875d89734) --- M src/kudu/tablet/cbtree-test.cc 1 file changed, 4 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/21451/1 -- To view, visit http://gerrit.cloudera.org:8080/21451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: newchange Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21451 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] Fix deadlock on fail for CBTree-test
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21446 ) Change subject: Fix deadlock on fail for CBTree-test .. Fix deadlock on fail for CBTree-test When TestConcurrentIterateAndInsert, TestConcurrentInsert, TestRacyConcurrentInsert fail while --gtest_repeat is used, they will keep running forever. Instead of just returning on fail, they should properly stop the other threads running, and then exit. To reproduce the problem, run this on ARM (where the test actually fails): ./bin/cbtree-test --gtest_repeat=100 --gtest_filter=*Racy* Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Reviewed-on: http://gerrit.cloudera.org:8080/21446 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin --- M src/kudu/tablet/cbtree-test.cc 1 file changed, 4 insertions(+), 3 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21446 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] Fix deadlock on fail for CBTree-test
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21446 ) Change subject: Fix deadlock on fail for CBTree-test .. Patch Set 4: Verified+1 unrelated test failures -- To view, visit http://gerrit.cloudera.org:8080/21446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21446 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Wed, 22 May 2024 19:22:23 + Gerrit-HasComments: No
[kudu-CR] Fix deadlock on fail for CBTree-test
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21446 ) Change subject: Fix deadlock on fail for CBTree-test .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21446 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Wed, 22 May 2024 19:22:31 + Gerrit-HasComments: No
[kudu-CR] Fix deadlock on fail for CBTree-test
Alexey Serbin has removed a vote on this change. Change subject: Fix deadlock on fail for CBTree-test .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia10d05dfdc4a12cb034450f432693f054d138498 Gerrit-Change-Number: 21446 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] Fix row project codegen params noalias overflow
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19952 ) Change subject: Fix row_project codegen params noalias overflow .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/19952/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19952/5//COMMIT_MSG@9 PS5, Line 9: function->addParamAttr is 0-based indexes, current row_project generator IR code is: : `define i1 @ProjRead(i8* %src, %"class.kudu::RowBlockRow"* noalias %rbrow, %"class.kudu::Arena"* noalias %arena)` > nit: alignment required I'm not going to update this -- this patch is already merged. -- To view, visit http://gerrit.cloudera.org:8080/19952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 19952 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 22 May 2024 13:53:41 + Gerrit-HasComments: Yes
[kudu-CR](branch-1.17.x) Fix row project codegen params noalias overflow
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21448 Change subject: Fix row_project codegen params noalias overflow .. Fix row_project codegen params noalias overflow function->addParamAttr is 0-based indexes, current row_project generator IR code is: `define i1 @ProjRead(i8* %src, %"class.kudu::RowBlockRow"* noalias %rbrow, %"class.kudu::Arena"* noalias %arena)` not same with before. Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Reviewed-on: http://gerrit.cloudera.org:8080/19952 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin (cherry picked from commit c485c8c3cf4b76d8a55e2ec98e27803341285f75) --- M src/kudu/codegen/row_projector.cc 1 file changed, 1 insertion(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/21448/1 -- To view, visit http://gerrit.cloudera.org:8080/21448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: newchange Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 21448 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Anonymous Coward
[kudu-CR] Fix row project codegen params noalias overflow
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19952 ) Change subject: Fix row_project codegen params noalias overflow .. Patch Set 4: Code-Review+2 Thanks a lot for the patch! I was cleaning up my mailbox and found an email about this patch. It's been almost a year! I'm not sure how I missed this. :) Indeed: the newly addParamAttr() uses 0-based indices once it has been introduced in LLVM 5.0.0 with [1] in March 2017. [1] https://github.com/llvm/llvm-project/commit/a0b45f4bfccb3ab197ca504032c28160ce82eac2 -- To view, visit http://gerrit.cloudera.org:8080/19952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 19952 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 22 May 2024 05:57:05 + Gerrit-HasComments: No
[kudu-CR] Fix row project codegen params noalias overflow
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19952 ) Change subject: Fix row_project codegen params noalias overflow .. Fix row_project codegen params noalias overflow function->addParamAttr is 0-based indexes, current row_project generator IR code is: `define i1 @ProjRead(i8* %src, %"class.kudu::RowBlockRow"* noalias %rbrow, %"class.kudu::Arena"* noalias %arena)` not same with before. Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Reviewed-on: http://gerrit.cloudera.org:8080/19952 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/codegen/row_projector.cc 1 file changed, 1 insertion(+), 2 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1dab7d46cff96ed1ebbd020584a066f04e6ca12a Gerrit-Change-Number: 19952 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-613: Introduce new BlockCache metrics
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21389 ) Change subject: KUDU-613: Introduce new BlockCache metrics .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21389/2/src/kudu/util/block_cache_metrics.cc File src/kudu/util/block_cache_metrics.cc: PS2: > I was thinking more in the terms of a single entry life journey tracking i. I don't think we could provide a per-element metric in this context. The way how it's usually achieved is maintaining a histogram-like metric that accumulates stats on the distribution of a gauge. I guess you are suggesting to collect stats on the frequency of seeing a particular key. But that's rather in the design of the cache, not just metrics of the SLRU cache. E.g., the TinyLFU cache keeps similar stats on per-element basis (actually, it's closer to Bloom filter's approach) and uses that for admission control decisions. But I might misunderstood your suggestion, though. Please feel free to chime in and correct my misunderstanding, if any. PS2: > I definitely considered just providing top level metrics for the SLRU cache Sure, we need to have detailed metrics for the parts of the SLRU cache. Re-using the set of LRU metrics for both the protected and the probational segments of the cache is quite natural and those stats provide information for generic troubleshooting and assessing the ratio of the segment sizes. However, you could provide function gauges for the high-level cache metrics that an end-user might be interested in, improving the usability for an end-user. If you prefer to address it in a separate patch, I'm OK with that approach. -- To view, visit http://gerrit.cloudera.org:8080/21389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c8181ec6bea301605aaef7db0003c3eaef3072d Gerrit-Change-Number: 21389 Gerrit-PatchSet: 4 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Wed, 22 May 2024 02:59:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce SLRU cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20607 ) Change subject: KUDU-613: Introduce SLRU cache .. Patch Set 14: (10 comments) http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG@22 PS16, Line 22: > This was a DEBUG build run locally on macOS. 6 cores and 2.6 GHz. I'll try Thanks for clarifying. We are mostly interested in RELEASE build stats in the context of performance evaluation. DEBUG and sanitizer builds don't provide any meaningful stats since nobody runs DEBUG builds in production :) You can always create create a VM in hosted or public cloud providers, build Kudu from scratch there and run performance tests. I think it's essential to have the actual stats for the lookup operations provides for a RELEASE build in this description. http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h File src/kudu/util/cache.h: http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h@228 PS14, Line 228: // Sets probationary and protected segments metrics to update corresponding counters accordingly. : // : // An upgrade of one entry will increment the number of evictions from the probationary segment by : // one and the number of insertions into the protected segment by one. : // : // A downgrade of one entry will increment the number of evictions from the protected segment by : // one and the number of insertions into the probationary segment by one. : // : // To calculate the number of inserts for the SLRU cache, subtract the number of evictions : // of the protected segment from the number of inserts of the probationary segment. : // : // To calculate the number of evictions for the SLRU cache, subtract the number of insertions : // of the protected segment from the number of evictions of the probationary segment. : // : // Other than insertions and evictions, to calculate the metric for the SLRU cache, add the : // corresponding metric from the probationary and protected segment together. : virtual void SetSegmentMetrics(std::unique_ptr probationary_metrics, : std::unique_ptr protected_metrics, : ExistingMetricsPolicy metrics_policy) = 0; > SetMetrics() is used in other areas of the code that use a cache (file_cach OK, I guess one can try to circle back on the attempts to clean up this mess and bad smell in a follow-up patch. In this iteration, please add a TODO for the extra methods that are specific to the SLRU cache. One approach might be adding extra interface(s) and moving the specific methods out of the common Cache interface. Thank you! http://gerrit.cloudera.org:8080/#/c/20607/18/src/kudu/util/slru_cache-test.cc File src/kudu/util/slru_cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20607/18/src/kudu/util/slru_cache-test.cc@442 PS18, Line 442: const int weight = static_cast(protected_segment_size_ / 100); It would be great to add a comment to explain what 'downgrade eviction' means along with short description on what essential invariants we have to track in this scenario and why. http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.h File src/kudu/util/slru_cache.h: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.h@40 PS16, Line 40: NewSLRUCache > I guess then the question is what are the template parameters? It makes sen Those are the usual benefits of having common approach: this is to have more uniform code when instantiating cache, and it automatically helps in a templatized code. It could also help with differences in the signatures of the constructors. For example, one extra level of indirection to help with the different signatures of the constructors might be introducing CacheParameters template and hiding differences in the set of parameters for the constructors. That's a common way of doing so -- you could check SslTypeTraits in openssl_util.h for an inspiration. I guess we could keep this as-is for now, especially if you are not fond of getting rid of the code ugliness in this patch itself. There might be an opportunity to clean this up later on, but most likely this will stay as ugly as its left in the first version :) http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc File src/kudu/util/slru_cache.cc: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@57 PS16, Line 57: class SLRUCacheShard { > It's definitely possible, however, the cache file would be even more pollut IIUC, you could
[kudu-CR] KUDU-2671: Update upstream docs
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21108 ) Change subject: KUDU-2671: Update upstream docs .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21108/1/docs/schema_design.adoc File docs/schema_design.adoc: http://gerrit.cloudera.org:8080/#/c/21108/1/docs/schema_design.adoc@443 PS1, Line 443: table wide nit: table-wide ? -- To view, visit http://gerrit.cloudera.org:8080/21108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8da554851a124d1d357be65d8bcc2c6c37875dcc Gerrit-Change-Number: 21108 Gerrit-PatchSet: 1 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 21 May 2024 23:00:25 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) gh-pages should not tirgger build
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21404 ) Change subject: gh-pages should not tirgger build .. Patch Set 1: It seems the test has passed -- no build had been triggered. Probably, it's time to close/abandon this patch? :) -- To view, visit http://gerrit.cloudera.org:8080/21404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I50aaf71ae5bf8a288d8c7347f1fe5ecdb5143654 Gerrit-Change-Number: 21404 Gerrit-PatchSet: 1 Gerrit-Owner: Marton Greber Gerrit-Reviewer: Alexey Serbin Gerrit-Comment-Date: Tue, 21 May 2024 20:55:26 + Gerrit-HasComments: No
[kudu-CR] [metrics] Add metrics for tablet copy op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21356 ) Change subject: [metrics] Add metrics for tablet copy op time .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc File src/kudu/tablet/tablet_metrics.cc: http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@227 PS8, Line 227: Tablet Copy Operation Duration Would 'Tablet Copy Duration' be good enough? http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@229 PS8, Line 229: on this tablet Does it make sense to mention this is the duration as seen from the source tablet replica? http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@229 PS8, Line 229: copy tablet tablet copying http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@231 PS8, Line 231: 6000LU Yingchun already pointed at that in PS5, but it seems there is still room for improvement. With current settings, 6000 stands for maximum value of 1 minute (60 seconds). Are you sure this makes sense? I suspect that for a large tablet replica being copied over a slow network it might take tens of minutes to complete the operation, so an hour for the maximum duration would be prudent. Also, I don't think it makes a lot of sense to have microseconds for the unit for the duration here. I'd think it to be milliseconds at most. http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_service-test.cc File src/kudu/tserver/tablet_copy_service-test.cc: http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_service-test.cc@237 PS8, Line 237: TEST_F(TabletCopyMetricTest, TestRunTimeMetricFinishState) { : const auto before_cnt = CopyRunTime()->TotalCount(); : string session_id; : ASSERT_OK(DoBeginValidTabletCopySession(_id)); : : EndTabletCopySessionResponsePB resp; : RpcController controller; : ASSERT_OK(DoEndTabletCopySession(session_id, true, nullptr, , )); : ASSERT_EQ(before_cnt + 1, CopyRunTime()->TotalCount()); : } Does it make sense to verify the corresponding metric both at the destination tablet replica as well? I guess it should show 0 for TotalCount(), right? http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_source_session.cc@514 PS8, Line 514: int64_t elapsed_ms = (MonoTime::Now() - start_time_).ToMilliseconds(); : metrics->tablet_copy_duration->Increment(elapsed_ms); In tablet_metrics.cc, the tablet_copy_duration is defined in microseconds is PS8, so this code is inconsistent. -- To view, visit http://gerrit.cloudera.org:8080/21356 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I088f6a9a8a07ad39ca95ae8b4995ce00d1a0d00c Gerrit-Change-Number: 21356 Gerrit-PatchSet: 8 Gerrit-Owner: KeDeng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Fri, 17 May 2024 18:14:55 + Gerrit-HasComments: Yes
[kudu-CR] [security-flags-itest] Fix missing command line flags
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21399 ) Change subject: [security-flags-itest] Fix missing command line flags .. Patch Set 4: Code-Review+2 Thank you for the fix! -- To view, visit http://gerrit.cloudera.org:8080/21399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Gerrit-Change-Number: 21399 Gerrit-PatchSet: 4 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Wed, 15 May 2024 15:57:32 + Gerrit-HasComments: No
[kudu-CR] [security-flags-itest] Fix missing command line flags
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21399 ) Change subject: [security-flags-itest] Fix missing command line flags .. [security-flags-itest] Fix missing command line flags It is a known phenomenon, that static libraries won't be included into an executable if there are no usage for any function or variable[1]. This means the initialization routines in the library won't be executed, even if these initialization routines have side effects, such as registering the variable in the gflags ecosystem. As a result, the CheckRpcAuthnFlagsGroupValidator test failed because "rpc_authentication" flag was not registered properly. To solve this issue, a command line variable check is added, so now the library will be used in the executable and the initialization routines will be executed. [1] https://stackoverflow.com/questions/1229430/how-do-i-prevent-my-unused-global-variables-being-compiled-out-of-my-static-li Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Reviewed-on: http://gerrit.cloudera.org:8080/21399 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/integration-tests/security-flags-itest.cc 1 file changed, 12 insertions(+), 0 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Gerrit-Change-Number: 21399 Gerrit-PatchSet: 5 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Ádám Bakai
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21415 ) Change subject: [common] get rid of MutexLock .. [common] get rid of MutexLock Since contemporary STL library provides both std::lock_guard and std::unique_lock, there is no need to keep MutexLock. Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Reviewed-on: http://gerrit.cloudera.org:8080/21415 Tested-by: Alexey Serbin Reviewed-by: Abhishek Chennaka --- M src/kudu/clock/builtin_ntp.cc M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/rpc/service_queue.h M src/kudu/server/diagnostics_log.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/table_scanner.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/async_logger.cc M src/kudu/util/async_logger.h M src/kudu/util/barrier.h M src/kudu/util/blocking_queue-test.cc M src/kudu/util/blocking_queue.h M src/kudu/util/cloud/instance_detector.cc M src/kudu/util/countdown_latch.h M src/kudu/util/debug/trace_event_impl.cc M src/kudu/util/debug/trace_event_synthetic_delay.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h M src/kudu/util/mutex.h M src/kudu/util/pstack_watcher.cc M src/kudu/util/rwc_lock.cc M src/kudu/util/test_graph.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 33 files changed, 146 insertions(+), 197 deletions(-) Approvals: Alexey Serbin: Verified Abhishek Chennaka: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 5 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21414 ) Change subject: [common] switch from unique_lock to lock_guard .. [common] switch from unique_lock to lock_guard To simplify working with STL synchronization primitives, this patch changes from std::unique_lock to std::lock_guard where appropriate. In addition, IWYU and ClangTidy's feedback has been addressed. Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Reviewed-on: http://gerrit.cloudera.org:8080/21414 Tested-by: Alexey Serbin Reviewed-by: Zoltan Chovan Reviewed-by: Abhishek Chennaka --- M src/kudu/clock/hybrid_clock.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/rpc/rpcz_store.cc M src/kudu/security/tls_context.cc M src/kudu/server/webserver.cc M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/scanners.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/trace_metrics.h 14 files changed, 36 insertions(+), 31 deletions(-) Approvals: Alexey Serbin: Verified Zoltan Chovan: Looks good to me, but someone else must approve Abhishek Chennaka: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan
[kudu-CR] [metrics] Add tablet level metrics for scans op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21291 ) Change subject: [metrics] Add tablet level metrics for scans op time .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tablet/tablet_metrics.h File src/kudu/tablet/tablet_metrics.h: http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tablet/tablet_metrics.h@68 PS4, Line 68: scoped_refptr tablet_scans_duration_wall_time; : scoped_refptr tablet_scans_duration_system_time; That's interesting: the system and the wall times are collected, but not the user time. Why? http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tablet/tablet_metrics.cc File src/kudu/tablet/tablet_metrics.cc: http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tablet/tablet_metrics.cc@160 PS4, Line 160: kInfo Here and below: I'm not sure this metric is anything but extra information which might be needed for troubleshooting purposes only: for each active scanner, corresponding information is already displayed at the 'Dashboard' tab of the embedded WebUI. With that, I'd think of kDebug level, not kInfo. http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/21291/4/src/kudu/tserver/tablet_service.cc@3215 PS4, Line 3215: auto elapsed = scanner_timer.GetElapsed(); There are several earlier returns from this method where scanner times are updated automatically by the scoped-like construct ScopedAddScannerTiming. However, the metrics aren't updated accordingly. Why? -- To view, visit http://gerrit.cloudera.org:8080/21291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f490cfb6f37aef60b34697100fb502374fcc503 Gerrit-Change-Number: 21291 Gerrit-PatchSet: 4 Gerrit-Owner: KeDeng Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Tue, 14 May 2024 22:15:37 + Gerrit-HasComments: Yes
[kudu-CR] [metrics] Add metrics for participant op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21375 ) Change subject: [metrics] Add metrics for participant op time .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21375/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21375/4//COMMIT_MSG@9 PS4, Line 9: Add tablet-level statistics to track the time consumption of : participant operations. Have you ever used multi-row transactions in one of your Kudu clusters? If talking about TXN participant meta-operations, I'd recommend introducing new metrics based on particular needs and observations from the field, not just blindly adding a metric to fill in the blank spots. One source of strange smell in this patch is having an umbrella timings metric for very different types of TXN state transitions. AFAIK, the timings for FINALIZE_COMMIT and ABORT_TXN might be different order of magnitude compared with BEGIN_TXN due to their nature, and accumulating them under the same umbrella looks like quite a strange idea to me. -- To view, visit http://gerrit.cloudera.org:8080/21375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie25d4bee27e9ba35925634b61fb518467bb59201 Gerrit-Change-Number: 21375 Gerrit-PatchSet: 4 Gerrit-Owner: KeDeng Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Tue, 14 May 2024 21:47:29 + Gerrit-HasComments: Yes
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21414 ) Change subject: [common] switch from unique_lock to lock_guard .. Patch Set 2: > (1 comment) One more reference that might be useful in this context: https://stackoverflow.com/questions/43019598/stdlock-guard-or-stdscoped-lock -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan Gerrit-Comment-Date: Tue, 14 May 2024 21:22:53 + Gerrit-HasComments: No
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has removed a vote on this change. Change subject: [common] get rid of MutexLock .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21415 ) Change subject: [common] get rid of MutexLock .. Patch Set 4: Verified+1 unrelated test failures, posted a fix to address at least one of them: https://gerrit.cloudera.org/#/c/21427/ -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan Gerrit-Comment-Date: Tue, 14 May 2024 21:21:02 + Gerrit-HasComments: No
[kudu-CR] [alter table-test] Increase assertion timeout
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21347 ) Change subject: [alter_table-test] Increase assertion timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG@10 PS1, Line 10: So timeout was increased to 60 seconds. > This looks like a different issue than KUDU-3509. You might need to update the build-support/dist_test.py to make it possible to start dist-test from binaries build at Ubuntu 22.04. If you want to run the test via dist-test, just build it on something like ubuntu18 or CentOS7 -- that's known to work. Otherwise, you could run the test locally with many iterations (i.e. setting --gtest_iterations=1000 or alike), and maybe adding extra flags to make the issue manifest itself more often (e.g., adding --stress_cpu_threads=8 or alike). The idea is to get the stats on the flakiness before and after the fix to make sure this patch addresses the issue. -- To view, visit http://gerrit.cloudera.org:8080/21347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc51365a798569c22e680eb3e7723a5f330d43c5 Gerrit-Change-Number: 21347 Gerrit-PatchSet: 1 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 14 May 2024 21:14:11 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] fix race accessing OpState's start time
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21427 Change subject: [tablet] fix race accessing OpState's start time .. [tablet] fix race accessing OpState's start time This patch addresses a race reported by TSAN with traces like below: WARNING: ThreadSanitizer: data race (pid=11024) Write of size 8 at 0x7b580011f260 by thread T174: #0 kudu::tablet::OpState::set_start_time(kudu::MonoTime) src/kudu/tablet/ops/op.h:274:58 #1 kudu::tablet::WriteOp::Start() src/kudu/tablet/ops/write_op.cc:273:11 #2 kudu::tablet::OpDriver::Prepare() src/kudu/tablet/ops/op_driver.cc:329:7 #3 kudu::tablet::OpDriver::PrepareTask() src/kudu/tablet/ops/op_driver.cc:249:31 ... Previous read of size 8 at 0x7b580011f260 by thread T5 (mutexes: write M835553159786377312): #0 kudu::tablet::OpState::start_time() const src/kudu/tablet/ops/op.h:272:40 #1 kudu::tablet::WriteOp::ToString() const src/kudu/tablet/ops/write_op.cc:378:36 #2 kudu::tablet::OpDriver::ToStringUnlocked() const src/kudu/tablet/ops/op_driver.cc:209:23 #3 kudu::tablet::OpDriver::ToString() const src/kudu/tablet/ops/op_driver.cc:203:10 #4 kudu::tablet::TabletReplica::GetInFlightOps(...) const src/kudu/tablet/tablet_replica.cc:728:41 #5 kudu::tserver::TabletServerPathHandlers::HandleTransactionsPage(...) src/kudu/tserver/tserver_path_handlers.cc:286:14 ... Change-Id: I52de0840aa20f64cf15c7a9da2d553257c7e85e7 --- M src/kudu/tablet/ops/op.h 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21427/1 -- To view, visit http://gerrit.cloudera.org:8080/21427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I52de0840aa20f64cf15c7a9da2d553257c7e85e7 Gerrit-Change-Number: 21427 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] [security-flags-itest] Fix missing command line flags
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21399 ) Change subject: [security-flags-itest] Fix missing command line flags .. Patch Set 3: Code-Review+1 > Please add a proper description of the problem and the solution. > > + Mention that the test fails because the flag is not registered > properly. > + The flag is defined in server_process.a. We do not use any > function or variable from it. > + It is a known C++ phenomenon that the INITIALIZATION ROUTINES of > the library might not be included if no symbol is used from the > library (despite the initialization routines having an effect on > used functionality, such as registering the variable in the gflags > ecosystem). > > Also, --whole-archive fixes the issue and works perfectly if you > use it only for the problematic library. The first google result > for "how to add --whole-archive in cmake" works. You are doing > something wrong... I'm OK with the current solution. I'm not sure that it's worth bringing in the --whole-archive alternative: we do have the LTO optimizations turned on, so it would be necessary to clarify how it affects the other binaries where the util library is linked into. -- To view, visit http://gerrit.cloudera.org:8080/21399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Gerrit-Change-Number: 21399 Gerrit-PatchSet: 3 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 14 May 2024 17:51:01 + Gerrit-HasComments: No
[kudu-CR] [security-flags-itest] Fix missing command line flags
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21399 ) Change subject: [security-flags-itest] Fix missing command line flags .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Gerrit-Change-Number: 21399 Gerrit-PatchSet: 3 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 14 May 2024 17:47:38 + Gerrit-HasComments: No
[kudu-CR] [common] get rid of MutexLock
Hello Marton Greber, Zoltan Chovan, Attila Bukor, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21415 to look at the new patch set (#4). Change subject: [common] get rid of MutexLock .. [common] get rid of MutexLock Since contemporary STL library provides both std::lock_guard and std::unique_lock, there is no need to keep MutexLock. Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 --- M src/kudu/clock/builtin_ntp.cc M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/rpc/service_queue.h M src/kudu/server/diagnostics_log.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/table_scanner.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/async_logger.cc M src/kudu/util/async_logger.h M src/kudu/util/barrier.h M src/kudu/util/blocking_queue-test.cc M src/kudu/util/blocking_queue.h M src/kudu/util/cloud/instance_detector.cc M src/kudu/util/countdown_latch.h M src/kudu/util/debug/trace_event_impl.cc M src/kudu/util/debug/trace_event_synthetic_delay.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h M src/kudu/util/mutex.h M src/kudu/util/pstack_watcher.cc M src/kudu/util/rwc_lock.cc M src/kudu/util/test_graph.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 33 files changed, 146 insertions(+), 197 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/21415/4 -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan
[kudu-CR] [common] get rid of MutexLock
Hello Marton Greber, Zoltan Chovan, Attila Bukor, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21415 to look at the new patch set (#3). Change subject: [common] get rid of MutexLock .. [common] get rid of MutexLock Since contemporary STL library provides both std::lock_guard and std::unique_lock, there is no need to keep MutexLock. Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 --- M src/kudu/clock/builtin_ntp.cc M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/rpc/service_queue.h M src/kudu/server/diagnostics_log.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/table_scanner.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/async_logger.cc M src/kudu/util/async_logger.h M src/kudu/util/barrier.h M src/kudu/util/blocking_queue-test.cc M src/kudu/util/blocking_queue.h M src/kudu/util/cloud/instance_detector.cc M src/kudu/util/countdown_latch.h M src/kudu/util/debug/trace_event_impl.cc M src/kudu/util/debug/trace_event_synthetic_delay.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h M src/kudu/util/mutex.h M src/kudu/util/pstack_watcher.cc M src/kudu/util/rwc_lock.cc M src/kudu/util/test_graph.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 33 files changed, 144 insertions(+), 197 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/21415/3 -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21415 ) Change subject: [common] get rid of MutexLock .. Patch Set 2: > > > Patch Set 2: Code-Review+1 > > > > > > My only concern/nit about this change is wether we're loosing a > > bit of code readability, many places a simple "lock" is used for > > name, and the information about the type of the lock is a bit > > harder to find out (e.g. have to look up the header file and find > > the lock declaration to see the lock type). > > > It's not a huge issue, and renaming every instance to more > > specific names seems excessive, just wanted to bring it up and > hear > > your thoughts. > > > > I think I have to clarify a bit, I meant that using > > 'decltype(insert-lock-name)' hides the lock type information, but > I > > might be just splitting hairs here :) > > That information in one click away if using an IDE (or one > key-press away if using YCM or other editors plugins). Yes, it's > no longer Yes, the type information is no longer duplicated as it used to be, but I don't see it as something negative. Take a look at other class members used around in the same .cc file: they have information on their type only in the .h header file where they are declared/defined, and everybody just uses the members in the .cc file without any problem. -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan Gerrit-Comment-Date: Tue, 14 May 2024 05:43:37 + Gerrit-HasComments: No
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21415 ) Change subject: [common] get rid of MutexLock .. Patch Set 2: > > Patch Set 2: Code-Review+1 > > > > My only concern/nit about this change is wether we're loosing a > bit of code readability, many places a simple "lock" is used for > name, and the information about the type of the lock is a bit > harder to find out (e.g. have to look up the header file and find > the lock declaration to see the lock type). > > It's not a huge issue, and renaming every instance to more > specific names seems excessive, just wanted to bring it up and hear > your thoughts. > > I think I have to clarify a bit, I meant that using > 'decltype(insert-lock-name)' hides the lock type information, but I > might be just splitting hairs here :) That information in one click away if using an IDE (or one key-press away if using YCM or other editors plugins). Yes, it's no longer -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Chovan Gerrit-Comment-Date: Tue, 14 May 2024 05:25:36 + Gerrit-HasComments: No
[kudu-CR] KUDU-3568 Fix compaction budgeting test by setting memory hard limit
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21416 ) Change subject: KUDU-3568 Fix compaction budgeting test by setting memory hard limit .. KUDU-3568 Fix compaction budgeting test by setting memory hard limit TestRowSetCompactionSkipWithBudgetingConstraints can fail if the memory on node running the test is high. It happens because the test generates deltas of size worth a few MBs that is multiplied with a preset factor to ensure the result (i.e. memory required for rowset compaction completion) is of high value of the order of 200 GB per rowset. Even though nodes running the test generally don't have so much physical memory, it is still possible to end up with high memory nodes. On such nodes, the test might fail. The patch fixes that problem by deterministically ensuring that compaction memory requirement is always higher than the memory hard limit. It does that by doing the following: 1. Move out the budgeting compaction tests out in a separate binary. 2. This gives flexibility to set the memory hard limit as per test needs. It is important to node that once a memory hard limit is set, it remains the same for all tests executed through binary lifecycle. 3. Set the hard memory limit to 1 GB which is enough to handle compaction requirements for TestRowSetCompactionProceedWithNoBudgetingConstraints. For TestRowSetCompactionSkipWithBudgetingConstraints, it is not enough because we set the delta memory factor high to exceed 1 GB. Both the test are now expected to succeed deterministically. Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Reviewed-on: http://gerrit.cloudera.org:8080/21416 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin --- M src/kudu/tablet/CMakeLists.txt A src/kudu/tablet/compaction-highmem-test.cc M src/kudu/tablet/compaction-test.cc 3 files changed, 221 insertions(+), 143 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Gerrit-Change-Number: 21416 Gerrit-PatchSet: 3 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3568 Fix compaction budgeting test by setting memory hard limit
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21416 ) Change subject: KUDU-3568 Fix compaction budgeting test by setting memory hard limit .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Gerrit-Change-Number: 21416 Gerrit-PatchSet: 2 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 13 May 2024 16:01:51 + Gerrit-HasComments: No
[kudu-CR] KUDU-3568 Fix compaction budgeting test by setting memory hard limit
Alexey Serbin has removed a vote on this change. Change subject: KUDU-3568 Fix compaction budgeting test by setting memory hard limit .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Gerrit-Change-Number: 21416 Gerrit-PatchSet: 2 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3568 Fix compaction budgeting test by setting memory hard limit
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21416 ) Change subject: KUDU-3568 Fix compaction budgeting test by setting memory hard limit .. Patch Set 2: Verified+1 unrelated test failures -- To view, visit http://gerrit.cloudera.org:8080/21416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Gerrit-Change-Number: 21416 Gerrit-PatchSet: 2 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 13 May 2024 16:01:48 + Gerrit-HasComments: No
[kudu-CR] KUDU-3568 Fix compaction budgeting test by setting memory hard limit
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21416 ) Change subject: KUDU-3568 Fix compaction budgeting test by setting memory hard limit .. Patch Set 1: Code-Review+1 (5 comments) overall looks good to me, just a few nits http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc File src/kudu/tablet/compaction-highmem-test.cc: http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc@18 PS1, Line 18: #include nit: please use C++ style headers, i.e. http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc@142 PS1, Line 142: (1024 * 1024 * 1024) nit: drop the parentheses? http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc@142 PS1, Line 142: FLAGS_memory_limit_hard_bytes = (1024 * 1024 * 1024); : : FLAGS_rowset_compaction_ancient_delta_threshold_enabled = true; : FLAGS_rowset_compaction_enforce_preset_factor = true; : FLAGS_rowset_compaction_memory_estimate_enabled = true; : : // Ensure memory budgeting applies : FLAGS_rowset_compaction_estimate_min_deltas_size_mb = 0; Since this is in a dedicated binary now, consider setting these flags just once in per-suite static setup function called SetUpTestSuite(). For details, see https://google.github.io/googletest/advanced.html#sharing-resources-between-tests-in-the-same-test-suite http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc@166 PS1, Line 166: sw.elapsed().ToString(), : trace->MetricsAsJSON()); nit: please fix the indent http://gerrit.cloudera.org:8080/#/c/21416/1/src/kudu/tablet/compaction-highmem-test.cc@179 PS1, Line 179: const uint32_t num_rowsets = 10; : const uint32_t num_rows_per_rowset = 2; : const uint32_t num_updates = 5000 * size_factor; nit: could make these constexpr ? -- To view, visit http://gerrit.cloudera.org:8080/21416 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85d104e1d066507ce8e72a00cc5165cc4b85e48d Gerrit-Change-Number: 21416 Gerrit-PatchSet: 1 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Sat, 11 May 2024 16:51:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce SLRU cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20607 ) Change subject: KUDU-613: Introduce SLRU cache .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc File src/kudu/util/slru_cache.cc: http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@543 PS17, Line 543: // Release from either the probationary or the protected shard. : if (!e->protected_segment) { : probationary_shard_->Release(handle); : } else { : protected_shard_->Release(handle); : } : } What guards against the following scenarios: Scenario_0: * thread_A: an instance of Cache::UniqueHandle as a wrapper for entry_X is being destroyed * thread_A: while executing SLRUCacheShardPair::Release(), finds entry_X isn't in the protected shard * thread_B: calls SLRUCacheShardPair::Lookup(), moving entry_X into the protected shard * thread_A: calls Release() on the corresponding probationary shard for entry_X Scenario_1: * thread_A: an instance of Cache::UniqueHandle as a wrapper for entry_X is being destroyed * thread_A: while executing SLRUCacheShardPair::Release(), finds entry_X is in the protected shard * thread_B: calls SLRUCacheShardPair::Lookup() for some other entry_Y which is in the probationary shard, but due to crossing the threshold of lookup operations per entry, it's now being moved into the protected shard, so entry_X is pushed out of the protected shard * thread_A: calls Release() on the corresponding protected shard for entry_X If nothing prevents from having such scenarios, then a few questions: * Does the contents SLRU cache stay consistent after that? * Do metrics of the SLRU cache stay consistent after that? -- To view, visit http://gerrit.cloudera.org:8080/20607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45531534a2049dd38c002f4dc7982df9fd46e0bb Gerrit-Change-Number: 20607 Gerrit-PatchSet: 17 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 11 May 2024 01:09:24 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce SLRU cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20607 ) Change subject: KUDU-613: Introduce SLRU cache .. Patch Set 16: (10 comments) http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc File src/kudu/util/slru_cache.cc: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@254 PS16, Line 254: e = table_.Lookup(key, hash); What guarantees do we have to be sure that 'table_' cannot be modified under the hood by concurrently running SLRUCacheShard::Insert()? It would be great to add a note about that. http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@267 PS16, Line 267: e = table_.Lookup(key, hash); ditto: what if Insert() is running concurrently and modifying the 'table_' -- what protects us from such concurrency issues? Would be great to add a note in the code. http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc File src/kudu/util/slru_cache.cc: http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@273 PS17, Line 273: bool last_reference = Unref(e); : if (last_reference) { nit: could these be joined, adding a comment on what's the intention of this code? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@471 PS17, Line 471: { What is the purpose of this extra scope? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@501 PS17, Line 501: { What is the purpose of this extra scope? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@552 PS17, Line 552: { What is the purpose of this extra scope? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@561 PS17, Line 561: { What is the purpose of this extra scope? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@581 PS17, Line 581: nit: http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@601 PS17, Line 601: // Similar to STLDeleteElements(). : auto begin = shards_.begin(); : while (begin != shards_.end()) { : auto temp = begin; : ++begin; : temp->reset(nullptr); : } Is this really needed? Wouldn't the contained unique_ptr be destroyed automatically upon the destruction of the containing vector? http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@694 PS17, Line 694: uint8_t* data = reinterpret_cast(h); : delete [] data; nit: could these lines be joined? -- To view, visit http://gerrit.cloudera.org:8080/20607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45531534a2049dd38c002f4dc7982df9fd46e0bb Gerrit-Change-Number: 20607 Gerrit-PatchSet: 16 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 10 May 2024 21:33:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce SLRU cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20607 ) Change subject: KUDU-613: Introduce SLRU cache .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h File src/kudu/util/cache.h: http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h@520 PS17, Line 520: // Returns true if probationary segment contains key. : // Returns false if not. : // Only used for SLRU cache. : virtual bool ProbationaryContains(const Slice& key) = 0; : : // Returns true if protected segment contains key. : // Returns false if not. : // Only used for SLRU cache. : virtual bool ProtectedContains(const Slice& key) = 0; > Similar to SetSegmentMetrics(), this doesn't seem to be a place for these m SLRUCacheShard is the right place for these, even without any extras, I don't see any reason to keep these specific methods in this generic interface (but I might be missing something). -- To view, visit http://gerrit.cloudera.org:8080/20607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45531534a2049dd38c002f4dc7982df9fd46e0bb Gerrit-Change-Number: 20607 Gerrit-PatchSet: 17 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 10 May 2024 20:57:44 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce SLRU cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20607 ) Change subject: KUDU-613: Introduce SLRU cache .. Patch Set 17: (16 comments) A few more high-level observations. http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG@22 PS16, Line 22: Lookups/sec It would be nice to mentions the following details: * What type of build it was? * What sort of machine it was: CPU cores and frequency? It's crucial to measure this for RELEASE builds, we aren't interested to look at other types of builds here, I guess. http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache-bench.cc File src/kudu/util/cache-bench.cc: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache-bench.cc@92 PS16, Line 92: ret+= " SLRU"; nit for here and elsewhere: separate variable and the operator by space http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h File src/kudu/util/cache.h: http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h@520 PS17, Line 520: // Returns true if probationary segment contains key. : // Returns false if not. : // Only used for SLRU cache. : virtual bool ProbationaryContains(const Slice& key) = 0; : : // Returns true if protected segment contains key. : // Returns false if not. : // Only used for SLRU cache. : virtual bool ProtectedContains(const Slice& key) = 0; Similar to SetSegmentMetrics(), this doesn't seem to be a place for these methods. http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h File src/kudu/util/cache.h: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@58 PS16, Line 58: Protects against random/long reads polluting cache. nit: consider either making this part of the comment more generic (as it should be at this level) or dropping this part altogether http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@91 PS16, Line 91: protected_segment nit: should be rather named 'in_protected_segment'? http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@116 PS16, Line 116: sanitize style nit: since this isn't an accessor-like method, its name should be capitalized http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h File src/kudu/util/cache.h: http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h@228 PS14, Line 228: // ensure this is safe, e.g. by destructing the entity that owned the : // original metrics. : kReset, : }; : // Set the cache metrics to update corresponding counters accordingly. : virtual void SetMetrics(std::unique_ptr metrics, : ExistingMetricsPolicy metrics_policy) = 0; : : // Sets probationary and protected segments metrics to update corresponding counters accordingly. : // : // An upgrade of one entry will increment the number of evictions from the probationary segment by : // one and the number of insertions into the protected segment by one. : // : // A downgrade of one entry will increment the number of evictions from the protected segment by : // one and the number of insertions into the probationary segment by one. : // : // To calculate the number of inserts for the SLRU cache, subtract the number of evictions : // of the protected segment from the number of inserts of the probationary segment. : // > This will be used in the block cache to set the metrics for both segments o OK. But even with that, instead of polluting the interface of the base Cache class you could make the BlockCache class to be a template by the type of the underlying cache implementation, and remove both SetMetrics() and SetSegmentMetrics() from the common Cache interface. http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/nvm_cache.h File src/kudu/util/nvm_cache.h: http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/nvm_cache.h@46 PS16, Line 46: // Number of lookups for this entry. Used for upgrading to protected segment in SLRU cache. : // Resets to 0 when moved between probationary and protected segments in both directions. : uint32_t lookups; Is this really needed? I didn't see NVM-based implementation of SLRU cache, so this looks a bit strange given you decided not to converge on {S}LRUHandle structure http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.h File src/kudu/util/slru_cache.h:
[kudu-CR] KUDU-613: Integrate SLRU cache into block cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21390 ) Change subject: KUDU-613: Integrate SLRU cache into block cache .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc File src/kudu/cfile/block_cache.cc: http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@235 PS2, Line 235: BlockCache::BlockCache() : : BlockCache(FLAGS_block_cache_capacity_mb * 1024 * 1024) { : } Interesting, but why this constructor isn't updated to take care of different type of the underlying cache? That's especially strange given the funny code below for the next constructor when the 'capacity' parameter is ignored upon using the SLRU type of the cache. http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@242 PS2, Line 242: CreateCache(FLAGS_block_cache_probationary_segment_capacity_mb * 1024 * 1024, : FLAGS_block_cache_protected_segment_capacity_mb * 1024 * 1024, : FLAGS_block_cache_lookups_before_upgrade)) { That's a bad smell -- how 'capacity' is then used in this branch? http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@247 PS2, Line 247: BlockCache::BlockCache(size_t probationary_segment_capacity, :size_t protected_segment_capacity, :size_t lookups) : : cache_(CreateCache(probationary_segment_capacity, protected_segment_capacity, lookups)) { : } With this code and the code above it smells even worse -- now there is a complete mess with trying to understand when a particular type of cache is created based on the constructor invoked and the --block_cache_eviction_policy setting. -- To view, visit http://gerrit.cloudera.org:8080/21390 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04411ab2756045f15a272f3397d46d871b087b03 Gerrit-Change-Number: 21390 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Fri, 10 May 2024 18:29:02 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Integrate SLRU cache into block cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21390 ) Change subject: KUDU-613: Integrate SLRU cache into block cache .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/21390/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21390/2//COMMIT_MSG@13 PS2, Line 13: The default : value is still the old LRU cache. I think it's crucial to enable at least a few integration-style scenarios using the block cache backed by the new SLRU implementation. I'd think of scenarios in tablet_server_stress-test.cc, full_stack-insert-scan-test.cc, dense_node-itest.cc and heavy-update-compaction-itest.cc http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc File src/kudu/cfile/block_cache.cc: http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@46 PS2, Line 46: Either 'LRU' (default) or 'SLRU' (experimental Please add a validator for the value, and make this flag case-insensitive, so "LRU" and "lru" both would work (as well as "lRu"). http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@46 PS2, Line 46: nit: drop the trailing space? http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@49 PS2, Line 49: block_cache_probationary_segment_capacity_mb For this the block_cache_protected_segment_capacity_mb flag below: Does it make sense to add a validator for this flag to reject at least negative values? From the other side, why not to use unsigned type for this flag? http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@50 PS2, Line 50: MB here and below: is this actually MiB/MiBytes? http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@58 PS2, Line 58: ," nit: reconsider the punctuation and the result message for the flag http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@171 PS2, Line 171: "Lower --block_cache_probationary_segment_capacity_mb and" nit: add space after 'and' http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@194 PS2, Line 194: FLAGS_block_cache_eviction_policy Please make sure the value of this flag is treated as case-insensitive. Doing that in just one place and then reusing the function that converts a string into a enumeration should be robust enough. http://gerrit.cloudera.org:8080/#/c/21390/2/src/kudu/cfile/block_cache.cc@222 PS2, Line 222: ToUpperCase(FLAGS_block_cache_eviction_policy, _block_cache_eviction_policy); I'm not sure it's a good idea to update the actual value of the flag like this: it might quite unexpected behavior from the user's perspective. Why not to use a temporary variable for converting to the upper case? -- To view, visit http://gerrit.cloudera.org:8080/21390 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04411ab2756045f15a272f3397d46d871b087b03 Gerrit-Change-Number: 21390 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Fri, 10 May 2024 17:56:59 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-613: Introduce new BlockCache metrics
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21389 ) Change subject: KUDU-613: Introduce new BlockCache metrics .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21389/2/src/kudu/util/block_cache_metrics.cc File src/kudu/util/block_cache_metrics.cc: PS2: > The new metrics give us information on inserts into the protected segment o >From the usability perspective, I'd think of all the metrics that allow for >computing SLRU cache's hit and miss ratios and show how many inserts and >evictions happened. That's what end-user is about to look at. Yes, most >likely it's possible to deduce that from counters available for the >probational and protected segments (but I haven't verified that), but where >one can find the rules to perform such computations and how many mistakes a >person is going to make when trying to compute those? I'd think of providing top-level metrics for the SLRU cache as we currently have for the LRU cache just for better usability. http://gerrit.cloudera.org:8080/#/c/21389/2/src/kudu/util/block_cache_metrics.cc@176 PS2, Line 176: nit for here and below: this isn't aligned with current Kudu style guide -- To view, visit http://gerrit.cloudera.org:8080/21389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c8181ec6bea301605aaef7db0003c3eaef3072d Gerrit-Change-Number: 21389 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Fri, 10 May 2024 17:50:01 + Gerrit-HasComments: Yes
[kudu-CR] [master] fix race in auto leader rebalancing
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21417 ) Change subject: [master] fix race in auto leader rebalancing .. [master] fix race in auto leader rebalancing It turned out that auto leader rebalancing task wasn't explicitly shutdown upon shutting down catalog manager. That lead to race conditions as reported by TSAN, at least in test scenarios (see below). This patch addresses the issue. WARNING: ThreadSanitizer: data race (pid=23827) Write of size 1 at 0x7b408208 by main thread: #0 AnnotateRWLockDestroy thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:264 (auto_rebalancer-test+0x33575e) #1 kudu::rw_spinlock::~rw_spinlock() src/kudu/util/locks.h:89:5 (libmaster.so+0x359376) #2 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:108:1 (libmaster.so+0x4ad201) #3 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:107:25 (libmaster.so+0x4ad229) #4 std::__1::default_delete::operator()(kudu::master::TSManager*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x407ce7) #5 std::__1::unique_ptr >::reset(kudu::master::TSManager*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x40157d) #6 std::__1::unique_ptr >::~unique_ptr() thirdparty/installed/tsan/include/c++/v1/memory:2471:19 (libmaster.so+0x4015eb) #7 kudu::master::Master::~Master() src/kudu/master/master.cc:263:1 (libmaster.so+0x3f7a4a) #8 kudu::master::Master::~Master() src/kudu/master/master.cc:261:19 (libmaster.so+0x3f7dc9) #9 std::__1::default_delete::operator()(kudu::master::Master*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x435627) #10 std::__1::unique_ptr >::reset(kudu::master::Master*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x42e6ed) #11 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:120:13 (libmaster.so+0x4c2612) ... Previous atomic write of size 4 at 0x7b408208 by thread T439 (mutexes: write M1141235379631443968): #0 __tsan_atomic32_compare_exchange_strong thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp:780 (auto_rebalancer-test+0x33eb60) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) /src/kudu/gutil/atomicops-internals-tsan.h:88:3 (libmaster.so+0x2e2b34) #2 kudu::rw_semaphore::unlock_shared() src/kudu/util/rw_semaphore.h:91:19 (libmaster.so+0x2e29c8) #3 kudu::rw_spinlock::unlock_shared() src/kudu/util/locks.h:99:10 (libmaster.so+0x2e28ef) #4 std::__1::shared_lock::~shared_lock() /thirdparty/installed/tsan/include/c++/v1/shared_mutex:369:19 (libmaster.so+0x2e23e0) #5 kudu::master::TSManager::GetAllDescriptors(std::__1::vector, std::__1::allocator > >*) const src/kudu/master/ts_manager.cc:206:1 (libmaster.so+0x4adeb6) #6 kudu::master::AutoLeaderRebalancerTask::RunLeaderRebalancer() src/kudu/master/auto_leader_rebalancer.cc:405:16 (libmaster.so+0x2fb51b) #7 kudu::master::AutoLeaderRebalancerTask::RunLoop() src/kudu/master/auto_leader_rebalancer.cc:445:7 (libmaster.so+0x2fbaa9) This is a follow-up to 10efaf2c77dfe5e4474505e0267c583c011703be. Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Reviewed-on: http://gerrit.cloudera.org:8080/21417 Reviewed-by: Wang Xixu <1450306...@qq.com> Tested-by: Alexey Serbin Reviewed-by: Yifan Zhang --- M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc 2 files changed, 9 insertions(+), 1 deletion(-) Approvals: Wang Xixu: Looks good to me, but someone else must approve Alexey Serbin: Verified Yifan Zhang: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Gerrit-Change-Number: 21417 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [master] fix race in auto leader rebalancing
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21417 ) Change subject: [master] fix race in auto leader rebalancing .. Patch Set 1: Verified+1 unrelated failure in java tests -- To view, visit http://gerrit.cloudera.org:8080/21417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Gerrit-Change-Number: 21417 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Fri, 10 May 2024 03:04:31 + Gerrit-HasComments: No
[kudu-CR] [master] fix race in auto leader rebalancing
Alexey Serbin has removed a vote on this change. Change subject: [master] fix race in auto leader rebalancing .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Gerrit-Change-Number: 21417 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21415 ) Change subject: [common] get rid of MutexLock .. Patch Set 2: Verified+1 unrelated test failures: raft_consensus-itest and Java tests -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Comment-Date: Thu, 09 May 2024 23:35:48 + Gerrit-HasComments: No
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has removed a vote on this change. Change subject: [common] get rid of MutexLock .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber
[kudu-CR] [common] get rid of MutexLock
Hello Marton Greber, Attila Bukor, Kudu Jenkins, Abhishek Chennaka, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21415 to look at the new patch set (#2). Change subject: [common] get rid of MutexLock .. [common] get rid of MutexLock Since contemporary STL library provides both std::lock_guard and std::unique_lock, there is no need to keep MutexLock. Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 --- M src/kudu/clock/builtin_ntp.cc M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/rpc/service_queue.h M src/kudu/server/diagnostics_log.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/table_scanner.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/async_logger.cc M src/kudu/util/async_logger.h M src/kudu/util/barrier.h M src/kudu/util/blocking_queue-test.cc M src/kudu/util/blocking_queue.h M src/kudu/util/cloud/instance_detector.cc M src/kudu/util/countdown_latch.h M src/kudu/util/debug/trace_event_impl.cc M src/kudu/util/debug/trace_event_synthetic_delay.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h M src/kudu/util/mutex.h M src/kudu/util/pstack_watcher.cc M src/kudu/util/rwc_lock.cc M src/kudu/util/test_graph.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 33 files changed, 154 insertions(+), 195 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/21415/2 -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21414 ) Change subject: [common] switch from unique_lock to lock_guard .. Patch Set 2: Verified+1 unrelated test flakiness: some well-known flakies and a new one (apparently, not related to this patch) https://issues.apache.org/jira/browse/KUDU-3573 -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 09 May 2024 20:26:07 + Gerrit-HasComments: No
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has removed a vote on this change. Change subject: [common] switch from unique_lock to lock_guard .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [master] fix race in auto leader rebalancing
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21417 Change subject: [master] fix race in auto leader rebalancing .. [master] fix race in auto leader rebalancing It turned out that auto leader rebalancing task wasn't explicitly shutdown upon shutting down catalog manager. That lead to race conditions as reported by TSAN, at least in test scenarios (see below). This patch addresses the issue. WARNING: ThreadSanitizer: data race (pid=23827) Write of size 1 at 0x7b408208 by main thread: #0 AnnotateRWLockDestroy thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:264 (auto_rebalancer-test+0x33575e) #1 kudu::rw_spinlock::~rw_spinlock() src/kudu/util/locks.h:89:5 (libmaster.so+0x359376) #2 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:108:1 (libmaster.so+0x4ad201) #3 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:107:25 (libmaster.so+0x4ad229) #4 std::__1::default_delete::operator()(kudu::master::TSManager*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x407ce7) #5 std::__1::unique_ptr >::reset(kudu::master::TSManager*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x40157d) #6 std::__1::unique_ptr >::~unique_ptr() thirdparty/installed/tsan/include/c++/v1/memory:2471:19 (libmaster.so+0x4015eb) #7 kudu::master::Master::~Master() src/kudu/master/master.cc:263:1 (libmaster.so+0x3f7a4a) #8 kudu::master::Master::~Master() src/kudu/master/master.cc:261:19 (libmaster.so+0x3f7dc9) #9 std::__1::default_delete::operator()(kudu::master::Master*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x435627) #10 std::__1::unique_ptr >::reset(kudu::master::Master*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x42e6ed) #11 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:120:13 (libmaster.so+0x4c2612) ... Previous atomic write of size 4 at 0x7b408208 by thread T439 (mutexes: write M1141235379631443968): #0 __tsan_atomic32_compare_exchange_strong thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp:780 (auto_rebalancer-test+0x33eb60) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) /src/kudu/gutil/atomicops-internals-tsan.h:88:3 (libmaster.so+0x2e2b34) #2 kudu::rw_semaphore::unlock_shared() src/kudu/util/rw_semaphore.h:91:19 (libmaster.so+0x2e29c8) #3 kudu::rw_spinlock::unlock_shared() src/kudu/util/locks.h:99:10 (libmaster.so+0x2e28ef) #4 std::__1::shared_lock::~shared_lock() /thirdparty/installed/tsan/include/c++/v1/shared_mutex:369:19 (libmaster.so+0x2e23e0) #5 kudu::master::TSManager::GetAllDescriptors(std::__1::vector, std::__1::allocator > >*) const src/kudu/master/ts_manager.cc:206:1 (libmaster.so+0x4adeb6) #6 kudu::master::AutoLeaderRebalancerTask::RunLeaderRebalancer() src/kudu/master/auto_leader_rebalancer.cc:405:16 (libmaster.so+0x2fb51b) #7 kudu::master::AutoLeaderRebalancerTask::RunLoop() src/kudu/master/auto_leader_rebalancer.cc:445:7 (libmaster.so+0x2fbaa9) This is a follow-up to 10efaf2c77dfe5e4474505e0267c583c011703be. Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b --- M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc 2 files changed, 9 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/21417/1 -- To view, visit http://gerrit.cloudera.org:8080/21417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Gerrit-Change-Number: 21417 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] [common] switch from unique lock to lock guard
Hello Attila Bukor, Kudu Jenkins, Abhishek Chennaka, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21414 to look at the new patch set (#2). Change subject: [common] switch from unique_lock to lock_guard .. [common] switch from unique_lock to lock_guard To simplify working with STL synchronization primitives, this patch changes from std::unique_lock to std::lock_guard where appropriate. In addition, IWYU and ClangTidy's feedback has been addressed. Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb --- M src/kudu/clock/hybrid_clock.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/rpc/rpcz_store.cc M src/kudu/security/tls_context.cc M src/kudu/server/webserver.cc M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/scanners.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/trace_metrics.h 14 files changed, 36 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/21414/2 -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [common] get rid of MutexLock
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21415 Change subject: [common] get rid of MutexLock .. [common] get rid of MutexLock Since contemporary STL library provides both std::lock_guard and std::unique_lock, there is no need to keep MutexLock. Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 --- M src/kudu/clock/builtin_ntp.cc M src/kudu/master/auto_leader_rebalancer.cc M src/kudu/master/catalog_manager.cc M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/rpc/service_queue.h M src/kudu/server/diagnostics_log.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/table_scanner.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/async_logger.cc M src/kudu/util/async_logger.h M src/kudu/util/barrier.h M src/kudu/util/blocking_queue-test.cc M src/kudu/util/blocking_queue.h M src/kudu/util/cloud/instance_detector.cc M src/kudu/util/countdown_latch.h M src/kudu/util/debug/trace_event_impl.cc M src/kudu/util/debug/trace_event_synthetic_delay.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/mem_tracker.cc M src/kudu/util/mem_tracker.h M src/kudu/util/mutex.h M src/kudu/util/pstack_watcher.cc M src/kudu/util/rwc_lock.cc M src/kudu/util/test_graph.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 33 files changed, 155 insertions(+), 195 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/21415/1 -- To view, visit http://gerrit.cloudera.org:8080/21415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I49e0ef2c688ef8be74d018bb9bffe70b6655e654 Gerrit-Change-Number: 21415 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has removed a vote on this change. Change subject: [common] switch from unique_lock to lock_guard .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21414 ) Change subject: [common] switch from unique_lock to lock_guard .. Patch Set 1: Verified+1 unrelated test failure: KUDU-1495 -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 09 May 2024 04:05:05 + Gerrit-HasComments: No
[kudu-CR] [common] switch from unique lock to lock guard
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21414 Change subject: [common] switch from unique_lock to lock_guard .. [common] switch from unique_lock to lock_guard To simplify working with STL synchronization primitives, this patch changes from std::lock_guard to std::unique_lock where appropriate. In addition, IWYU and ClangTidy's feedback has been addressed. This patch doesn't contain any functional modifications. Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb --- M src/kudu/clock/hybrid_clock.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/rpc/rpcz_store.cc M src/kudu/security/tls_context.cc M src/kudu/server/webserver.cc M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/scanners.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/trace_metrics.h 14 files changed, 36 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/21414/1 -- To view, visit http://gerrit.cloudera.org:8080/21414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I72d86ca730113ec0652154d5ce509fc2e479befb Gerrit-Change-Number: 21414 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-613: Cleanup of cache code
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21018 ) Change subject: KUDU-613: Cleanup of cache code .. KUDU-613: Cleanup of cache code This patch moves some classes out of the anonymous namespace and into the headers of the cache and nvm_cache files. These classes will be used by the new SLRU cache. This path also templatizes the HandleTable class to be used by both the cache and nvm_cache files. Change-Id: I506d4577c0ae873b01d7fa4f53846d6fd0f664cf Reviewed-on: http://gerrit.cloudera.org:8080/21018 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/util/cache.cc M src/kudu/util/cache.h M src/kudu/util/file_cache.cc M src/kudu/util/nvm_cache.cc M src/kudu/util/nvm_cache.h 5 files changed, 206 insertions(+), 287 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I506d4577c0ae873b01d7fa4f53846d6fd0f664cf Gerrit-Change-Number: 21018 Gerrit-PatchSet: 10 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-613: Cleanup of cache code
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21018 ) Change subject: KUDU-613: Cleanup of cache code .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506d4577c0ae873b01d7fa4f53846d6fd0f664cf Gerrit-Change-Number: 21018 Gerrit-PatchSet: 9 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 08 May 2024 16:57:50 + Gerrit-HasComments: No
[kudu-CR] [alter table-test] Increase assertion timeout
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21347 ) Change subject: [alter_table-test] Increase assertion timeout .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG@7 PS1, Line 7: [alter_table-test] Increase assertion timeout There is a Jira issue about flakiness of this particular scenario: KUDU-3509 Does it make sense to refer to JIRA in the summary? http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG@10 PS1, Line 10: So timeout was increased to 60 seconds. Does it actually help making this test more stable? If the test in RELEASE build failed the same way as mentioned in KUDU-3509, I'm not sure this patch could help with the issue. Could you provide results of running the test before and after the fix via dist-test, at least for RELEASE and DEBUG builds? http://gerrit.cloudera.org:8080/#/c/21347/1//COMMIT_MSG@9 PS1, Line 9: On ubuntu 22.04 release build, AlterReplicationFactorWhileWriting test : failed with timeout. Could you attach the log to KUDU-3509? -- To view, visit http://gerrit.cloudera.org:8080/21347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc51365a798569c22e680eb3e7723a5f330d43c5 Gerrit-Change-Number: 21347 Gerrit-PatchSet: 1 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Wed, 08 May 2024 14:16:48 + Gerrit-HasComments: Yes
[kudu-CR] [metrics] Add metrics for create and delete op time
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21316 ) Change subject: [metrics] Add metrics for create and delete op time .. [metrics] Add metrics for create and delete op time Add server-level statistics to track the time consumption of create tablet and delete tablet operations. The addition of monitoring items will aid in historical issue tracking and analysis, as well as facilitate the configuration of monitoring alarms. Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4 Reviewed-on: http://gerrit.cloudera.org:8080/21316 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 4 files changed, 82 insertions(+), 0 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4 Gerrit-Change-Number: 21316 Gerrit-PatchSet: 6 Gerrit-Owner: KeDeng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [metrics] Add metrics for create and delete op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21316 ) Change subject: [metrics] Add metrics for create and delete op time .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4 Gerrit-Change-Number: 21316 Gerrit-PatchSet: 5 Gerrit-Owner: KeDeng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 07 May 2024 14:04:00 + Gerrit-HasComments: No
[kudu-CR] [CMakeLists] Make kudu test main static
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21282 ) Change subject: [CMakeLists] Make kudu_test_main static .. Patch Set 7: Verified+1 unrelated test failures -- To view, visit http://gerrit.cloudera.org:8080/21282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Gerrit-Change-Number: 21282 Gerrit-PatchSet: 7 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 07 May 2024 07:26:40 + Gerrit-HasComments: No
[kudu-CR] [CMakeLists] Make kudu test main static
Alexey Serbin has removed a vote on this change. Change subject: [CMakeLists] Make kudu_test_main static .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Gerrit-Change-Number: 21282 Gerrit-PatchSet: 7 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai
[kudu-CR] [metrics] Add metrics for create and delete op time
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21316 ) Change subject: [metrics] Add metrics for create and delete op time .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG@11 PS1, Line 11: These monitoring metrics will be very helpful for analyzing : issues related to high CPU usage. > Thank you very much for your response. Yes: in general that makes sense, of course. I was just trying to say that creating/deleting a tablet replica is mostly disk IO, but not many CPU cycles. Adding metrics here and there might be a guessing game. If the goal is to spot CPU bottlenecks, I can also recommend using built-in tracing: https://kudu.apache.org/docs/troubleshooting.html#kudu_tracing In addition, running 'htop -p ' and performing stracing, etc. could pin-point particular threads that consume a lot of CPU. http://gerrit.cloudera.org:8080/#/c/21316/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21316/4//COMMIT_MSG@11 PS4, Line 11: analyzing : issues related to high CPU usage How the stats on the duration of create/delete a tablet replica could help with analyzing high CPU usage scenarios FWIW, most of the activity while creating/deleting a tablet is attributed to disk IO. If you are interested in tracking CPU usage of some activity (not attributed to IO wait times), I could recommend taking a look at the 'reactor_load_percent' metric. I hope this helps. http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@273 PS4, Line 273: What is the significance of this 'on the current node' part? All the tablet metrics are attributed to the node where the tablet replica is hosted, no? If so, maybe drop this part? http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@274 PS4, Line 274: Why kInfo, not kDebug? Looking at metrics like 'tablets_opening_time_startup', it seems this sort of metric is something that would be used mostly for troubleshooting. http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@281 PS4, Line 281: ditto: maybe, kDebug is a better choice here? http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@1168 PS4, Line 1168: Shouldn't the delete_tablet_run_time_ metric be updated before return here as well? -- To view, visit http://gerrit.cloudera.org:8080/21316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4 Gerrit-Change-Number: 21316 Gerrit-PatchSet: 1 Gerrit-Owner: KeDeng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 07 May 2024 07:26:16 + Gerrit-HasComments: Yes
[kudu-CR] [CMakeLists] Make kudu test main static
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21282 ) Change subject: [CMakeLists] Make kudu_test_main static .. [CMakeLists] Make kudu_test_main static This change makes sure that kudu_util.so is loaded at the beginning of the library search order. This is needed because in unwind_safeness.cc there is a dlsym execution in the constructor function, that creates a wrapper for dlopen and dlclose to prevent a potential deadlock during unwind stack resolve. It looks for the next declaration of the functions called "dlopen" and "dlclose" in shared object files. If kudu_util is loaded too late, then it won't find these functions and throws an error. This happens in ubuntu 22.04 test runs. To solve this issue, the kudu_test_main was changed to a static library and kudu_util was moved to the front of the library list. This is a best effort fix, and it should only have impact on test execution. Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Reviewed-on: http://gerrit.cloudera.org:8080/21282 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin --- M src/kudu/util/CMakeLists.txt 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Gerrit-Change-Number: 21282 Gerrit-PatchSet: 8 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai
[kudu-CR] [CMakeLists] Make kudu test main static
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21282 ) Change subject: [CMakeLists] Make kudu_test_main static .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Gerrit-Change-Number: 21282 Gerrit-PatchSet: 7 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 07 May 2024 07:28:09 + Gerrit-HasComments: No
[kudu-CR] [CMakeLists] Make kudu test main static
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21282 ) Change subject: [CMakeLists] Make kudu_test_main static .. Patch Set 7: Alright, if this helps in your environment, probably there is some value in this changelist. I couldn't see this happening in my Ubuntu 22.04 environments, though. Probably, there is something quite specific in your case and/or configuration. -- To view, visit http://gerrit.cloudera.org:8080/21282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0dfeb1fa04ed91e95fd1f8d789f020dd44289fea Gerrit-Change-Number: 21282 Gerrit-PatchSet: 7 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Martonka Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 07 May 2024 07:28:08 + Gerrit-HasComments: No
[kudu-CR](gh-pages) [blog] Fix typo in blogpost
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21378 ) Change subject: [blog] Fix typo in blogpost .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: Iec743b5151ac9c139d3b3513f99d0369cfe0fc93 Gerrit-Change-Number: 21378 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Mon, 06 May 2024 22:13:29 + Gerrit-HasComments: No
[kudu-CR] [security-flags-itest] Fix missing command line flags
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21399 ) Change subject: [security-flags-itest] Fix missing command line flags .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21399/1/src/kudu/integration-tests/security-flags-itest.cc File src/kudu/integration-tests/security-flags-itest.cc: http://gerrit.cloudera.org:8080/#/c/21399/1/src/kudu/integration-tests/security-flags-itest.cc@54 PS1, Line 54: ASSERT_EQ(FLAGS_rpc_authentication, "required"); nit for {ASSERT,EXPECT}_EQ: the expected value comes first -- that way it's easier to comprehend the error message upon triggering an assertion -- To view, visit http://gerrit.cloudera.org:8080/21399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec751e8761562612d97b886740c9b20cd134a0bc Gerrit-Change-Number: 21399 Gerrit-PatchSet: 1 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Mon, 06 May 2024 16:10:49 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 10: (1 comment) > (1 comment) > > I realized that, the only proper way to add a 2-way load barrier is > with > std::atomic_thread_fence(std::memory_order_acquire), which is more > strict, than > the 1-way std::atomic::load(std::memory_order_acquire); > > I made some performance test. I used TestScanPerformance from > cbtree-test and > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 > more > performance tests, one using basic arena, one using > ThreadSafeMemoryTrackingArena (we only use the 2nd in production). > I will add g++ tests too (as soon as it compiles). > See the branches sheet for name resolution. For some reason, I assumed you were going to add those tests into this patch or as a separate patch, but it seems by 'adding > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing > (1 comment) > > I realized that, the only proper way to add a 2-way load barrier is > with > std::atomic_thread_fence(std::memory_order_acquire), which is more > strict, than > the 1-way std::atomic::load(std::memory_order_acquire); > > I made some performance test. I used TestScanPerformance from > cbtree-test and > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 > more > performance tests, one using basic arena, one using > ThreadSafeMemoryTrackingArena (we only use the 2nd in production). > I will add g++ tests too (as soon as it compiles). > See the branches sheet for name resolution. For some reason, I assumed you were going to add your new tests as a part of this changelist as well, but it seems you just meant adding results from running those tests into the spreadsheet :) OK, maybe it makes sense to add your new tests into Kudu as well? Feel free to post that as a separate changelist. Also, it would be great if you could address other feedback items (nits, questions of other reviewers, etc.) and post a follow-up revision, if necessary. Thanks! > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33 PS6, Line 33: The following 6 barriers need to be more strict: > See the google sheet. Thanks for posting the numbers. A few questions: * What were the machines that x86_64 and ARM tests were run at? How close they are in terms of processing power (frequency, number of CPU cores)? * Do I understand correctly that both the x86_64 and ARM machines were the same for all the tests, just different binaries were run? * I'm not sure I understand what 'x86_master_clang_2' at the Summary tab is for? How is it different from 'x86_master_clang'? * The numbers posted at the Summary tab: are those seconds of test runtime or that's some completely different (e.g., number scanned/inserted keys), or for different tests they are of different units? * The numbers posted at the Summary tab for the 'TestMemRowSetUpdatePerformance' test: what are those? Are those just total runtime or some particular timing report part (inserting, counting, updating) of the test? Also, could you run existing corresponding Kudu tests (you can run you new tests as well, of course) prior and post your gerrit patch under the 'perf' utility (say, 10 runs) and report the results. No need to put them into a spreadsheet, just present the output from the 'perf stat' utility. You can browse 'git log' and search for the examples how running those tests is usually done. You might also be interested in perf-mem and may be in perf-lock, but reports just from generic 'perf' would be great already. Thanks a lot! https://man7.org/linux/man-pages/man1/perf.1.html -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Sat, 04 May 2024 03:06:59 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) [blog] Fix typo in blogpost
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21378 ) Change subject: [blog] Fix typo in blogpost .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: Iec743b5151ac9c139d3b3513f99d0369cfe0fc93 Gerrit-Change-Number: 21378 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Sat, 04 May 2024 01:43:10 + Gerrit-HasComments: No
[kudu-CR] KUDU-3216 fix flakiness in LeadershipChangeOnTskGeneration
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21397 Change subject: KUDU-3216 fix flakiness in LeadershipChangeOnTskGeneration .. KUDU-3216 fix flakiness in LeadershipChangeOnTskGeneration Before this patch, the LeadershipChangeOnTskGeneration scenario from CatalogManagerTskITest would fail once in about 50 runs, DEBUG build. With this patch, the test scenario hasn't had a single failure in about 300 runs, DEBUG build. Change-Id: I139e46627206fc13490ca405bb62dc29934dc4be --- M src/kudu/integration-tests/catalog_manager_tsk-itest.cc 1 file changed, 22 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/21397/1 -- To view, visit http://gerrit.cloudera.org:8080/21397 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I139e46627206fc13490ca405bb62dc29934dc4be Gerrit-Change-Number: 21397 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] [g++11] Fix DecrementIntCell for g++10 and g++11
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21396 ) Change subject: [g++11] Fix DecrementIntCell for g++10 and g++11 .. [g++11] Fix DecrementIntCell for g++10 and g++11 There seems to be a compiler bug, that optimizes out the safety check for INT_MIN in the DecrementIntCell function. It appears on RHEL 9.2 with g++ 11.4.1. Only in Release build. For more infoi, see: https://stackoverflow.com/questions/78424303/g-optimizes-away-check-for-int-min-in-release-build The issue seems to be fixed in g++12 and not yet present in g++9. Solution: Slightly change the function to ensure it is compiled correctly. This modification should not alter the correct optimized code. Basically, any change where the compiler cannot perform the two optimization steps (in this order) should address the issue: + if (x == INT_MIN) x = INT_MAX; else x -= 1; > x -= 1 (this is equivalent on the x86 platform). + if (x - 1 < x) > if (true) (this equivalence holds only at the mathematical level). Change-Id: Ia3cea2849a88c4d7e2587ceb805cd3258652e3c5 Reviewed-on: http://gerrit.cloudera.org:8080/21396 Tested-by: Alexey Serbin Reviewed-by: Alexey Serbin --- M src/kudu/common/key_util.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia3cea2849a88c4d7e2587ceb805cd3258652e3c5 Gerrit-Change-Number: 21396 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [g++11] Fix DecrementIntCell for g++10 and g++11
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21396 ) Change subject: [g++11] Fix DecrementIntCell for g++10 and g++11 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia3cea2849a88c4d7e2587ceb805cd3258652e3c5 Gerrit-Change-Number: 21396 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 03 May 2024 18:02:57 + Gerrit-HasComments: No
[kudu-CR] [g++11] Fix DecrementIntCell for g++10 and g++11
Alexey Serbin has removed a vote on this change. Change subject: [g++11] Fix DecrementIntCell for g++10 and g++11 .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia3cea2849a88c4d7e2587ceb805cd3258652e3c5 Gerrit-Change-Number: 21396 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [g++11] Fix DecrementIntCell for g++10 and g++11
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21396 ) Change subject: [g++11] Fix DecrementIntCell for g++10 and g++11 .. Patch Set 3: Verified+1 unrelated test failures: * KUDU-3216 -- To view, visit http://gerrit.cloudera.org:8080/21396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia3cea2849a88c4d7e2587ceb805cd3258652e3c5 Gerrit-Change-Number: 21396 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 03 May 2024 18:01:27 + Gerrit-HasComments: No
[kudu-CR] KUDU-3568 Fix budgeting constraint test by enabling preset factor
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21360 ) Change subject: KUDU-3568 Fix budgeting constraint test by enabling preset factor .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21360/2//COMMIT_MSG Commit Message: PS2: Thank you for the write up -- I must admit I haven't read it yet to digest everything, but I'm planning to do so when I can allocate more time for this. The only feedback at this point from my side is that the test is still failing even with this fix: src/kudu/tablet/compaction-test.cc:910: Failure Value of: JoinStrings(sink.logged_msgs(), "\n") Expected: has substring "removed from compaction input due to memory constraints" Actual: "I20240503 09:52:40.236797 953582 compaction-test.cc:904] CompactRowSetsOp complete. Timing: real 0.872s\tuser 0.844s\tsys 0.028s Metrics: {\"bytes_written\":4817,\"cfile_cache_hit\":90,\"cfile_cache_hit_bytes\":4310,\"cfile_cache_miss\":330,\"cfile_cache_miss_bytes\":3794180,\"cfile_init\":41,\"delta_iterators_relevant\":40,\"dirs.queue_time_us\":1303,\"dirs.run_cpu_time_us\":824,\"dirs.run_wall_time_us\":844,\"drs_written\":1,\"lbm_read_time_us\":4281,\"lbm_reads_lt_1ms\":494,\"lbm_write_time_us\":4794,\"lbm_writes_lt_1ms\":132,\"mutex_wait_us\":223,\"num_input_rowsets\":10,\"peak_mem_usage\":2147727,\"rows_written\":20,\"spinlock_wait_cycles\":83584,\"thread_start_us\":418,\"threads_started\":7}" I20240503 09:52:40.240758 953582 test_util.cc:175] --- I20240503 09:52:40.240803 953582 test_util.cc:176] Had fatal failures, leaving test files at /tmp/kudutest-0/compaction-test.TestCompaction.TestRowSetCompactionSkipWithBudgetingConstraints.1714755106413846-953582-0 -- To view, visit http://gerrit.cloudera.org:8080/21360 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9df218cd2d8ef3709793db267d5a0d651421dbb6 Gerrit-Change-Number: 21360 Gerrit-PatchSet: 2 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Fri, 03 May 2024 16:57:19 + Gerrit-HasComments: Yes
[kudu-CR] [tserver] limit number of trace metrics for TabletCopyClient
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21393 ) Change subject: [tserver] limit number of trace metrics for TabletCopyClient .. [tserver] limit number of trace metrics for TabletCopyClient The trace metrics registry assumes that the number of entries is quite small, with the current threshold of 100. In its turn, a thread pool unconditionally registers and updates its trace metrics while executing submitted tasks. Since a tablet server might host thousands of tablet replicas, it's not a good idea to include the UUID of the tablet into the name of the download thread pool spawned by every TabletCopyClient instance. This is a follow-up to 0d95304fa46ee5d96bcaa934c7660369f2860e06. Change-Id: I334aa81aaed2378e7cae558bd8bb9e0f0c970fec Reviewed-on: http://gerrit.cloudera.org:8080/21393 Tested-by: Marton Greber Reviewed-by: Marton Greber --- M src/kudu/tserver/tablet_copy_client.cc 1 file changed, 10 insertions(+), 2 deletions(-) Approvals: Marton Greber: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I334aa81aaed2378e7cae558bd8bb9e0f0c970fec Gerrit-Change-Number: 21393 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber
[kudu-CR] [tserver] limit number of trace metrics for TabletCopyClient
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21393 Change subject: [tserver] limit number of trace metrics for TabletCopyClient .. [tserver] limit number of trace metrics for TabletCopyClient The trace metrics registry assumes that the number of entries is quite small, with the current threshold of 100. In its turn, a thread pool unconditionally registers and updates its trace metrics while executing submitted tasks. Since a tablet server might host thousands of tablet replicas, it's not a good idea to include the UUID of the tablet into the name of the download thread pool spawned by every TabletCopyClient instance. This is a follow-up to 0d95304fa46ee5d96bcaa934c7660369f2860e06. Change-Id: I334aa81aaed2378e7cae558bd8bb9e0f0c970fec --- M src/kudu/tserver/tablet_copy_client.cc 1 file changed, 10 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/21393/1 -- To view, visit http://gerrit.cloudera.org:8080/21393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I334aa81aaed2378e7cae558bd8bb9e0f0c970fec Gerrit-Change-Number: 21393 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR](branch-1.17.x) KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates()
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21366 ) Change subject: KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates() .. KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates() This patch addresses data race reported in KUDU-3569, using the tablet schema defined by the iterator's projection instead of the schema stored in the tablet metadata file. The latter might be swapped by concurrently running AlterTable, which is the root cause of the data race. This is a follow-up to 936d7edc4e4b69d2e1f1dffc96760cb3fd57a934. Change-Id: I92daa74cb86a77a4350f42db9ca5dec3a0d4ff75 Reviewed-on: http://gerrit.cloudera.org:8080/21359 Tested-by: Alexey Serbin Reviewed-by: Abhishek Chennaka (cherry picked from commit 977f1911fbcc4d5c323d6ae7ce7c1ab100ed11ea) Conflicts: src/kudu/tablet/cfile_set.h Reviewed-on: http://gerrit.cloudera.org:8080/21366 --- M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h 2 files changed, 16 insertions(+), 6 deletions(-) Approvals: Abhishek Chennaka: Looks good to me, approved Alexey Serbin: Verified -- To view, visit http://gerrit.cloudera.org:8080/21366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: merged Gerrit-Change-Id: I92daa74cb86a77a4350f42db9ca5dec3a0d4ff75 Gerrit-Change-Number: 21366 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR](branch-1.17.x) KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates()
Alexey Serbin has removed a vote on this change. Change subject: KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates() .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: deleteVote Gerrit-Change-Id: I92daa74cb86a77a4350f42db9ca5dec3a0d4ff75 Gerrit-Change-Number: 21366 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR](branch-1.17.x) KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates()
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21366 ) Change subject: KUDU-3569 fix race in CFileSet::Iterator::OptimizePKPredicates() .. Patch Set 1: Verified+1 unrelated test failures -- To view, visit http://gerrit.cloudera.org:8080/21366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: I92daa74cb86a77a4350f42db9ca5dec3a0d4ff75 Gerrit-Change-Number: 21366 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Thu, 02 May 2024 20:59:33 + Gerrit-HasComments: No
[kudu-CR](branch-1.17.x) KUDU-3570 fix use-after-free in MajorDeltaCompactionOp
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21365 ) Change subject: KUDU-3570 fix use-after-free in MajorDeltaCompactionOp .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21365 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: comment Gerrit-Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Gerrit-Change-Number: 21365 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Thu, 02 May 2024 20:59:01 + Gerrit-HasComments: No
[kudu-CR](branch-1.17.x) KUDU-3570 fix use-after-free in MajorDeltaCompactionOp
Alexey Serbin has removed a vote on this change. Change subject: KUDU-3570 fix use-after-free in MajorDeltaCompactionOp .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/21365 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: deleteVote Gerrit-Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Gerrit-Change-Number: 21365 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR](branch-1.17.x) KUDU-3570 fix use-after-free in MajorDeltaCompactionOp
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21365 ) Change subject: KUDU-3570 fix use-after-free in MajorDeltaCompactionOp .. KUDU-3570 fix use-after-free in MajorDeltaCompactionOp This patch addresses heap-use-after-free and data race issues reported in KUDU-3570. With this and one prior patch, neither TSAN nor ASAN reports any warnings when running alter_table-randomized-test, at least that's the stats collected from more than 100 iterations. Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Reviewed-on: http://gerrit.cloudera.org:8080/21362 Tested-by: Alexey Serbin Reviewed-by: Abhishek Chennaka (cherry picked from commit 3912a97cd8998ef04c4e6f9c38bd365c582e8171) Reviewed-on: http://gerrit.cloudera.org:8080/21365 Reviewed-by: Wang Xixu <1450306...@qq.com> --- M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h 2 files changed, 15 insertions(+), 10 deletions(-) Approvals: Wang Xixu: Looks good to me, but someone else must approve Abhishek Chennaka: Looks good to me, approved Alexey Serbin: Verified -- To view, visit http://gerrit.cloudera.org:8080/21365 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: merged Gerrit-Change-Id: I491c6d98bed8780bcfb62f152db471d7a260d305 Gerrit-Change-Number: 21365 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3371 Fix the crash bug when run binaries on older CPU machines
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21287 ) Change subject: KUDU-3371 Fix the crash bug when run binaries on older CPU machines .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@7 PS1, Line 7: KUDU-3371 > Use a dedicated jira ID that addresses this specific crash. KUDU-3371 is a +1 http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@11 PS1, Line 11: the nit: drop http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@12 PS1, Line 12: the nit: a http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@12 PS1, Line 12: (e.g. AVX512) Are you sure that was the only reason behind those crashes? As already pointed at other reviewers, it might help to summarize your findings in an upstream JIRA item. Let me know if you want me to help with providing information on crashes I see in my environment. Thank you! http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@15 PS1, Line 15: WITH_SNAPPY > nit: Should this be 'PORTABLE'? +1 http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@15 PS1, Line 15: build nit: building http://gerrit.cloudera.org:8080/#/c/21287/1//COMMIT_MSG@15 PS1, Line 15: enable nit: enables -- To view, visit http://gerrit.cloudera.org:8080/21287 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id30ae995c41a592fccbdb822bc1f457c5e6878ac Gerrit-Change-Number: 21287 Gerrit-PatchSet: 1 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 02 May 2024 19:42:37 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3371 check for RocksDB dir presence upon opening FSManager
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21295 ) Change subject: KUDU-3371 check for RocksDB dir presence upon opening FSManager .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21295/1/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: http://gerrit.cloudera.org:8080/#/c/21295/1/src/kudu/fs/data_dirs.cc@212 PS1, Line 212: // Initialize the directory only if it's healthy. : // Note: the unhealthy directories wi > Because the rocksdb instance will be Open() in RdbDir::Prepare(), this proc I'm not sure I understand: what exactly makes them not suitable to have in the constructor? There isn't any error handling in this code except for logging a WARNING message. Could you please clarify? -- To view, visit http://gerrit.cloudera.org:8080/21295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab4bffc6b902ab96edf0ca6c44f51c8db2670d52 Gerrit-Change-Number: 21295 Gerrit-PatchSet: 3 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 02 May 2024 19:05:27 + Gerrit-HasComments: Yes