[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners
Hello Tidy Bot, Dan Burkert, David Ribeiro Alves, Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9783 to look at the new patch set (#11). Change subject: KUDU-16 pt 1: add server-side limits on scanners .. KUDU-16 pt 1: add server-side limits on scanners This patch adds the functionality to specify limits to the number of rows returned per tablet server scanner. Internally, each server-side scanner now tracks the number of rows it has already returned and adjusts the number of rows to serialize per batch based on this limit. Setting a non-positive limit is grounds for short circuiting the scan. Note: Use of this functionality by the client will come in a later patch. Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c --- M src/kudu/common/rowblock.cc M src/kudu/common/rowblock.h M src/kudu/common/scan_spec.cc M src/kudu/common/scan_spec.h M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tserver.proto 11 files changed, 182 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9783/11 -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9871 ) Change subject: docs: add troubleshooting info on detecting ext2 and ext3 filesystems .. docs: add troubleshooting info on detecting ext2 and ext3 filesystems Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Reviewed-on: http://gerrit.cloudera.org:8080/9871 Reviewed-by: Todd LipconTested-by: Kudu Jenkins --- M docs/troubleshooting.adoc 1 file changed, 13 insertions(+), 6 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Gerrit-Change-Number: 9871 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9871 ) Change subject: docs: add troubleshooting info on detecting ext2 and ext3 filesystems .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Gerrit-Change-Number: 9871 Gerrit-PatchSet: 2 Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 21:37:20 + Gerrit-HasComments: No
[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9835 ) Change subject: KUDU-2384. Don't collect thread stacks when running under debugger .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b Gerrit-Change-Number: 9835 Gerrit-PatchSet: 4 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 21:23:27 + Gerrit-HasComments: No
[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9835 to look at the new patch set (#4). Change subject: KUDU-2384. Don't collect thread stacks when running under debugger .. KUDU-2384. Don't collect thread stacks when running under debugger This changes the stack watchdog and the 'collect all stacks' utility code to not run when it appears that the process is running under a tracer such as gdb or strace. This makes debugging Kudu processes significantly easier because you don't need to know the magical incantation 'handle SIGUSR2 nostop noprint'. I tested manually by running a master with --diagnostics-log-stack-traces-interval-ms=10 and attaching to it with gdb. With this patch, I didn't see any signals sent. This patch also changes the Env EIO fault injection to never inject failures on /proc/. This is to avoid a TSAN warning since KernelStackWatchdog would otherwise access the env_inject_eio_globs flag concurrent to tests modifying it. Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b --- M src/kudu/util/debug-util.cc M src/kudu/util/debug-util.h M src/kudu/util/env_posix.cc M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/os-util.cc M src/kudu/util/os-util.h 6 files changed, 82 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/9835/4 -- To view, visit http://gerrit.cloudera.org:8080/9835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b Gerrit-Change-Number: 9835 Gerrit-PatchSet: 4 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9871 ) Change subject: docs: add troubleshooting info on detecting ext2 and ext3 filesystems .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc File docs/troubleshooting.adoc: http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc@59 PS1, Line 59: ext3, or ext4 formatted device; see link:https://unix.stackexchange.com/q/60723[this Stack > why don't we put the file and blkid examples here instead of linking to sta Well, if we're going to do that, I'd like to include the tune2fs example too (because I don't know exactly what UNIX tools someone has installed). And that's a lot of text. I figured this is rare enough that going to SE isn't such a big deal. -- To view, visit http://gerrit.cloudera.org:8080/9871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Gerrit-Change-Number: 9871 Gerrit-PatchSet: 1 Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 21:11:47 + Gerrit-HasComments: Yes
[kudu-CR] meta cache: improve multi-threaded scalability
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9760 ) Change subject: meta_cache: improve multi-threaded scalability .. meta_cache: improve multi-threaded scalability When profiling a multi-threaded 'kudu perf loadgen' I found the bottleneck was on the rw_spinlock acquisition in MetaCache::LookupTabletByKeyFastPath as well as some contention in RemoteTablet::stale(). This fixes the first by switching to a percpu_rwlock and the second by using an atomic bool instead of a spinlock-protected bool. This moved LookupTabletByKeyFastPath from the first line of the profile (14% of CPU) down to the 8th line (3% of CPU). Change-Id: I38c8a94ea177bfa4f4e2048355464f76a5daa2ba Reviewed-on: http://gerrit.cloudera.org:8080/9760 Reviewed-by: Adar DemboTested-by: Todd Lipcon --- M src/kudu/client/meta_cache.cc M src/kudu/client/meta_cache.h M src/kudu/util/locks.h 3 files changed, 25 insertions(+), 17 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/9760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I38c8a94ea177bfa4f4e2048355464f76a5daa2ba Gerrit-Change-Number: 9760 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow This test was flaky in TSAN builds for two reasons: (1) even though taking the stack trace is always pretty fast, symbolization can sometimes take more than a second. If that was the case, the assertion that looked at the diagnostics log content would fail just because the expected log was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test failure rate with 4 stress threads down from about 60% to about 0.3%. (2) with periodic stack dumping enabled, it was possible that the periodic dump was already in progress at the time that the overflow-triggered dump was triggered. In that case, the overflow-triggered dump would be ignored. That was expected behavior, but the test would fail due to not seeing the overflow listed as the "reason". This patch fixes this issue by disabling periodic dumping in the test. In order to disable periodic dumping but leave triggered dumping enabled, I changed the semantics of the configuration slightly. Previously, setting it to '0' would disable all dumping. Now, setting it to '0' disables periodic dumping but still leaves triggered dumps enabled. Setting it to '-1' disables all dumping. With both these changes I was able to loop the test 300 times with 4 stress threads in TSAN mode with no failures. Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Reviewed-on: http://gerrit.cloudera.org:8080/9867 Reviewed-by: Adar DemboTested-by: Todd Lipcon --- M src/kudu/master/master-test.cc M src/kudu/server/diagnostics_log.cc 2 files changed, 26 insertions(+), 10 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has removed a vote on this change. Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 21:06:19 + Gerrit-HasComments: No
[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9871 ) Change subject: docs: add troubleshooting info on detecting ext2 and ext3 filesystems .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc File docs/troubleshooting.adoc: http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc@59 PS1, Line 59: ext3, or ext4 formatted device; see link:https://unix.stackexchange.com/q/60723[this Stack why don't we put the file and blkid examples here instead of linking to stack exchange? -- To view, visit http://gerrit.cloudera.org:8080/9871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Gerrit-Change-Number: 9871 Gerrit-PatchSet: 1 Gerrit-Owner: Adar DemboGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 19:54:55 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9783 ) Change subject: KUDU-16 pt 1: add server-side limits on scanners .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc@805 PS10, Line 805:int64_t max_num_rows_to_return, > was thinking about this from a performance angle a bit more this morning. I Hmmm interesting, so the idea would be to resize the selection vector post-scan, eh? Makes sense; I'll try that out. -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 19:53:26 + Gerrit-HasComments: Yes
[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9870 ) Change subject: thirdparty: add patch to fix printing large span stats in /memz .. thirdparty: add patch to fix printing large span stats in /memz This fixes a bug that I introduced upstream in gperftools by importing the upstream patch that fixed it. The issue was mismatched arguments to printf() which resulted in incorrect printing of stats in /memz. Without this fix, memz always claimed to have 128 large spans: >128 large *128 spans ~0.0 MiB;2.7 MiB cum; unmapped: >0.0 MiB; > 0.0 MiB cum With the fix it properly reports 0 spans on startup of an empty master: >128 large * 0 spans ~0.0 MiB;3.6 MiB cum; unmapped: >0.0 MiB; > 0.0 MiB cum Change-Id: I144830c069ec13017fd95b24aecf1489755b336f Reviewed-on: http://gerrit.cloudera.org:8080/9870 Reviewed-by: Will BerkeleyTested-by: Kudu Jenkins --- M thirdparty/download-thirdparty.sh A thirdparty/patches/gperftools-unbreak-memz.patch 2 files changed, 30 insertions(+), 1 deletion(-) Approvals: Will Berkeley: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f Gerrit-Change-Number: 9870 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems
Hello Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/9871 to review the following change. Change subject: docs: add troubleshooting info on detecting ext2 and ext3 filesystems .. docs: add troubleshooting info on detecting ext2 and ext3 filesystems Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 --- M docs/troubleshooting.adoc 1 file changed, 13 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/9871/1 -- To view, visit http://gerrit.cloudera.org:8080/9871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9 Gerrit-Change-Number: 9871 Gerrit-PatchSet: 1 Gerrit-Owner: Adar DemboGerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9783 ) Change subject: KUDU-16 pt 1: add server-side limits on scanners .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc@805 PS10, Line 805:int64_t max_num_rows_to_return, was thinking about this from a performance angle a bit more this morning. I think it would be faster to actually do the limiting on the selection vector of the RowBlock rather than adding new conditionals into the CopyColumn code. That way the branches happen once up front instead of on every column, and the stuff like 'num_rows' calculation would just be adjusted automatically. Thoughts? -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 18:56:03 + Gerrit-HasComments: Yes
[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9870 ) Change subject: thirdparty: add patch to fix printing large span stats in /memz .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f Gerrit-Change-Number: 9870 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 30 Mar 2018 18:50:08 + Gerrit-HasComments: No
[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz
Hello Will Berkeley, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/9870 to review the following change. Change subject: thirdparty: add patch to fix printing large span stats in /memz .. thirdparty: add patch to fix printing large span stats in /memz This fixes a bug that I introduced upstream in gperftools by importing the upstream patch that fixed it. The issue was mismatched arguments to printf() which resulted in incorrect printing of stats in /memz. Without this fix, memz always claimed to have 128 large spans: >128 large *128 spans ~0.0 MiB;2.7 MiB cum; unmapped: >0.0 MiB; > 0.0 MiB cum With the fix it properly reports 0 spans on startup of an empty master: >128 large * 0 spans ~0.0 MiB;3.6 MiB cum; unmapped: >0.0 MiB; > 0.0 MiB cum Change-Id: I144830c069ec13017fd95b24aecf1489755b336f --- M thirdparty/download-thirdparty.sh A thirdparty/patches/gperftools-unbreak-memz.patch 2 files changed, 30 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/9870/1 -- To view, visit http://gerrit.cloudera.org:8080/9870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f Gerrit-Change-Number: 9870 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2191 (8/n): Integrate HmsCatalog into CatalogManager
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9863 ) Change subject: KUDU-2191 (8/n): Integrate HmsCatalog into CatalogManager .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/integration-tests/master_hms-itest.cc@215 PS2, Line 215: LOG(INFO) << "STOPPING HMS!!!"; Still want this? http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/master/catalog_manager.cc@2280 PS2, Line 2280: // 7. Update sys-catalog with the new table schema and tablets to add/drop. Need to renumber the rest of the steps. -- To view, visit http://gerrit.cloudera.org:8080/9863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie68e143c3c317c7690af097e6485934feb1010b4 Gerrit-Change-Number: 9863 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 17:40:46 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9783 ) Change subject: KUDU-16 pt 1: add server-side limits on scanners .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@502 PS9, Line 502: Returns number of > It's necessary for the scanned rows metrics. AFAICT, scanners (and their in I meant, scanners can *span* multiple *batches*, and the row collectors don't, and so we currently update metrics once per batch. -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 17:37:59 + Gerrit-HasComments: Yes
[kudu-CR] rowset metadata: cache min/max encoded keys
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9372 ) Change subject: rowset_metadata: cache min/max encoded keys .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/9372/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9372/6//COMMIT_MSG@29 PS6, Line 29: tablet. I repeated this, configuring Kudu to read the keys from the > This seems to be missing the punch line. Is this implying that, with the pa We could cut out the 15s difference (these two rows are time spent bootstrapping when reading keys from the rowset meta vs reading from the data blocks. Updated to be a bit more clear. -- To view, visit http://gerrit.cloudera.org:8080/9372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d Gerrit-Change-Number: 9372 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 17:35:33 + Gerrit-HasComments: Yes
[kudu-CR] rowset metadata: cache min/max encoded keys
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9372 to look at the new patch set (#7). Change subject: rowset_metadata: cache min/max encoded keys .. rowset_metadata: cache min/max encoded keys This patch adds a new flag rowset_metadata_store_keys that, when true, will indicate that Kudu should duplicate diskrowset min/max keys into the rowset metadata. Doing so allows Kudu to read the keys from tablet metadata and bootstrap tablets without having to fully initializing the CFileReaders for the key columns of each rowset. A small test is added to tablet_server-test that ensures we don't read any extraneous bytes when starting up the tablet server if we're reading keys from the rowset metadata. I benchmarked this with ~50GB of flushed YCSB data (92 tablets of varying sizes) on a single node with 4 data directories and a separate WAL/metadata directory. To set up, I let the server flush/compact for a while so bootstrap times wouldn't be dominated by reading WAL segments, and set rowset_metadata_store_keys to true so the tserver had the option of reading the cached keys from the rowset metadata at startup. With the above setup, I started the tserver with a disabled maintenance manager (to avoid further IO) and waited for the tablets to get to a RUNNING state, recording the sum of the logged bootstrap times of each tablet. I repeated this, configuring Kudu to read the keys from the rowset metadata, and to read the keys from the data blocks, dropping OS caches in between runs. The results are below. Run number: 1 2 3 Avg Reading cached keys (s): 26.430 24.143 20.826 23.800 Not reading cached keys (s): 40.578 38.428 37.093 38.700 Based on this, ~15 seconds worth of bootstrapping time was spent on initializing the key index readers, that could be avoided by reading the keys from the rowset metadata instead. Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d --- M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tserver/tablet_server-test.cc 7 files changed, 124 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/9372/7 -- To view, visit http://gerrit.cloudera.org:8080/9372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d Gerrit-Change-Number: 9372 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2191 (7/n): HmsCatalog
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9862 ) Change subject: KUDU-2191 (7/n): HmsCatalog .. Patch Set 2: (14 comments) The IWYU failures are weird; could you bug Alexey? http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/CMakeLists.txt File src/kudu/hms/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/CMakeLists.txt@85 PS2, Line 85: ADD_KUDU_TEST(hms_catalog-test RUN_SERIAL true NUM_SHARDS 4) Nit: should precede hms_client-test alphabetically. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc File src/kudu/hms/hms_catalog-test.cc: http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@168 PS2, Line 168: ASSERT_OK(hms_->Stop()); : ASSERT_OK(hms_client_->Stop()); Don't the destructors do this automatically? http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@242 PS2, Line 242: string table_id = "table-id"; : string hms_database_name = "default"; : string hms_table_name = "table_name"; : string table_name = Substitute("$0.$1", hms_database_name, hms_table_name); : string hms_altered_table_name = "altered_table_name"; : string altered_table_name = Substitute("$0.$1", hms_database_name, hms_altered_table_name); Add 'const' and use kVariableName notation, since these are constants. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@263 PS2, Line 263: ASSERT_OK(hms_catalog_->DropTable(table_id, altered_table_name)); And then verify that the table is gone? http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@272 PS2, Line 272: CHECK_OK Could ASSERT_OK here too. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@280 PS2, Line 280: ASSERT_OK(hms_client_->GetTable("default", "foo", )); To make it clearer that AlterTable will create the entry, could you assert that it doesn't exist before the alter? Below as well. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@287 PS2, Line 287: // Drop a table containing a non Hive-compatible character, and ensure it doesn't fail. And verify that the tables are gone? Or is that a given? http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@297 PS2, Line 297: string table_id = "abc123"; : string hms_database = "default"; : : string hms_external_table = "external_table"; : string external_table_name = Substitute("$0.$1", hms_database, hms_external_table); : : string hms_kudu_table = "kudu_table"; : string kudu_table_name = Substitute("$0.$1", hms_database, hms_kudu_table); const and kName. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@358 PS2, Line 358: // Drop a Kudu table with no corresponding HMS entry. Verify that the table is gone? http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@383 PS2, Line 383: operations Nit: and ensure that the same operations succeed It's a bit pedantic, but it's important for these to be constant to prove that only one thing has changed: the status of the HMS. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc File src/kudu/hms/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@65 PS2, Line 65: // Validated in master.cc. Nit: if this applies to hive_metastore_sasl_enabled, please move it to just before the definition rather than just after. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@70 PS2, Line 70: TAG_FLAG(hive_metastore_retry_count, advanced); : TAG_FLAG(hive_metastore_retry_count, experimental); FWIW, I think experimental implies advanced so I don't think you need to tag with both. http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@74 PS2, Line 74: DEFINE_int32(hive_metastore_send_timeout, 60, : "Configures the socket send timeout, in seconds, for Thrift " : "connections to the Hive Metastore."); : TAG_FLAG(hive_metastore_send_timeout, advanced); : TAG_FLAG(hive_metastore_send_timeout, experimental); : TAG_FLAG(hive_metastore_send_timeout, runtime); : : DEFINE_int32(hive_metastore_recv_timeout, 60, : "Configures the socket receive timeout, in seconds, for Thrift " : "connections to the Hive Metastore."); : TAG_FLAG(hive_metastore_recv_timeout, advanced); : TAG_FLAG(hive_metastore_recv_timeout, experimental); : TAG_FLAG(hive_metastore_recv_timeout, runtime); : :
[kudu-CR] KUDU-2191 (6/n): Fixups to the C++ HMS client
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9861 ) Change subject: KUDU-2191 (6/n): Fixups to the C++ HMS client .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client-test.cc File src/kudu/hms/hms_client-test.cc: http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client-test.cc@20 PS1, Line 20: #include So IWYU wants you to remove all of these new includes. Are all of them necessary to compile this particular patch (as opposed to others in the series)? If so, could you annotate each one with the appropriate IWYU pragma? http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h File src/kudu/hms/hms_client.h: http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h@22 PS1, Line 22: #include My guess is that this is already included through ThriftHiveMetastore.h, which is why IWYU wants you to remove it. http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h@33 PS1, Line 33: class NotificationEvent; : class Partition; What happens if you drop these? Perhaps they're declared/defined in the ThriftHiveMetastore.h chain? -- To view, visit http://gerrit.cloudera.org:8080/9861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a8d984e67fe6b4c004725828b4296476c2389e Gerrit-Change-Number: 9861 Gerrit-PatchSet: 1 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:59:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9835 ) Change subject: KUDU-2384. Don't collect thread stacks when running under debugger .. Patch Set 3: > > That said, the test in question (TestCreateWithFailedDirs) already resets > > env_inject_eio to 0 when it finishes; if it didn't, we'd also see a warning > > that we couldn't clean up the test's temp dir. But I guess here the issue > > isn't with env_inject_eio, but with env_inject_eio_globs? We don't > > typically reset that one. > > Yea, the issue is with env_inject_eio_globs which is a string and thus > particularly unsafe to read. The odd thing, though, is that we only read > env_inject_eio_globs when env_inject_eio is non-zero, but it appears we're > still reading it. So perhaps we aren't resetting env_inject_eio properly in > all cases or somesuch. I think it's a real data race: right before the main thread (T1) resets env_inject_eio, the kernel stack watchdog thread (T2) gets scheduled and reads the non-zero gflag value. Then T1 is rescheduled and resets to 0, but it's too late: when T2 is rescheduled, it will read env_inject_eio_globs. -- To view, visit http://gerrit.cloudera.org:8080/9835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b Gerrit-Change-Number: 9835 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:51:06 + Gerrit-HasComments: No
[kudu-CR] KUDU-2191 (6/n): Hive Metastore catalog manager integration
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8312 ) Change subject: KUDU-2191 (6/n): Hive Metastore catalog manager integration .. Patch Set 10: (7 comments) http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc File src/kudu/hms/hms_client.cc: http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc@202 PS8, Line 202: // operations properly. > Yes, I believe it's taken directly from the hive-site.xml so it's possible Gotcha. Thanks for the explanation. http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@162 PS8, Line 162: } : : // Checks that a table does not exist in the Kudu and HMS catalogs. : void CheckCatalogsNegative(const string& database_n > I've tried to reduce the number of tests cases since spinning up an HMS is No good ideas here short of changing the MiniHMS lifecycle to outlive its minicluster (so that it's reset once per test suite rather than once per test). But that's a really bad idea for other reasons. If you have the time, it might be fruitful to profile the HMS startup and figure out why it takes so long. If that time is spent searching an enormous classpath, we may be able to prune it to improve startup time. http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@268 PS8, Line 268: ASSERT_TRUE(s.IsIllegalState()) << s.ToString(); : ASSERT_STR_CONTAINS(s.ToString(), " > You may be reading too much into this. This is just setting up a hypotheti But isn't the CheckCatalogs() call on L277 ensuring that the Kudu table (renamed_table_name) and the HMS table (hms_renamed_table_name) have the same names? I was still confused so I went through the entire test and wrote down the actions taking place: // Create the Kudu table. Create Kudu table rename_db.kudu_table - Implicit create of same HMS table // Create a non-Kudu HMS table entry. Create non-Kudu HMS table rename_db.external_table // Drop the HMS table entry and rename the table. This tests that the // HmsCatalog will create a new entry when necessary. Drop HMS table rename_db.kudu_table Rename Kudu table from rename_db.kudu_table to rename_db.renamed_table - Implicit create of Kudu HMS table rename_db.renamed_table // Rename the table back to the original name. This tests the happy path. Rename Kudu table from rename_db.renamed_table to rename_db.kudu_table - Implicit rename of Kudu HMS table rename_db.renamed_table to rename_db.kudu_table // Drop the HMS table entry, then create a non-Kudu table entry in it's place, // and attempt to rename the table. Drop Kudu HMS table rename_db.kudu_table Create non-Kudu HMS table rename_db.kudu_table Rename Kudu table from rename_db.kudu_table to rename_db.renamed_table - No implicit rename because rename_db.kudu_table is non-Kudu HMS entry - Implicit create of Kudu HMS table rename_db.renamed_table What I was missing was that in the last Kudu rename, not only does Kudu NOT rename the existing HMS entry (because it's a non-Kudu entry), but it'll implicitly create a new HMS entry for the destination table name. A couple things that would make this easier to understand: 1. hms_client_->CreateTable() and CreateTable() are similar looking; rename CreateTable() to something else. 2. The table name variables are also pretty similar looking; rename them so that they stand out from one another more. http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@1755 PS9, Line 1755: Schema schema; > It's a hack that allows using the RETURN_NOT_OK_LOG macro inside the lambda Hmm, I do think it's somewhat unintuitive; I could see someone else reading this and thinking that they've misunderstood how ScopedCleanup works. I wouldn't mind it if there was a comment saying the return value is ignored. Or if you wrote it out manually. http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@201 PS9, Line 201: HMS table entry: " : << s.ToString(); : return create_table(); > Well you're right to expect there are TOCTOU races going on all over the pl I don't understand your explanation. On L172 we retrieve the original HMS table entry. On L207, if that entry's name matches the new name, we short-circuit. What's stopping another HMS client (perhaps another Kudu cluster?) from renaming the entry from the original
[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9835 ) Change subject: KUDU-2384. Don't collect thread stacks when running under debugger .. Patch Set 3: > Patch Set 3: > That said, the test in question (TestCreateWithFailedDirs) already resets > env_inject_eio to 0 when it finishes; if it didn't, we'd also see a warning > that we couldn't clean up the test's temp dir. But I guess here the issue > isn't with env_inject_eio, but with env_inject_eio_globs? We don't typically > reset that one. Yea, the issue is with env_inject_eio_globs which is a string and thus particularly unsafe to read. The odd thing, though, is that we only read env_inject_eio_globs when env_inject_eio is non-zero, but it appears we're still reading it. So perhaps we aren't resetting env_inject_eio properly in all cases or somesuch. Anyway I'll do option (c) since I think it makes sense anyway - we don't want to inject failures reading from proc in general. -- To view, visit http://gerrit.cloudera.org:8080/9835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b Gerrit-Change-Number: 9835 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:37:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:26:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106 PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms); > Nit: alphabetical order (at least switch with master_inject_latency_on_tabl Done -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:09:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Hello Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9867 to look at the new patch set (#2). Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow This test was flaky in TSAN builds for two reasons: (1) even though taking the stack trace is always pretty fast, symbolization can sometimes take more than a second. If that was the case, the assertion that looked at the diagnostics log content would fail just because the expected log was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test failure rate with 4 stress threads down from about 60% to about 0.3%. (2) with periodic stack dumping enabled, it was possible that the periodic dump was already in progress at the time that the overflow-triggered dump was triggered. In that case, the overflow-triggered dump would be ignored. That was expected behavior, but the test would fail due to not seeing the overflow listed as the "reason". This patch fixes this issue by disabling periodic dumping in the test. In order to disable periodic dumping but leave triggered dumping enabled, I changed the semantics of the configuration slightly. Previously, setting it to '0' would disable all dumping. Now, setting it to '0' disables periodic dumping but still leaves triggered dumps enabled. Setting it to '-1' disables all dumping. With both these changes I was able to loop the test 300 times with 4 stress threads in TSAN mode with no failures. Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e --- M src/kudu/master/master-test.cc M src/kudu/server/diagnostics_log.cc 2 files changed, 26 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/2 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 ) Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot .. Patch Set 3: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Gerrit-Change-Number: 9864 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:07:37 + Gerrit-HasComments: No
[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9864 ) Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot .. KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot This test was flaky with two different failure modes which I believe have the same root cause: when the test starts, it's possible that there is still some thread from a prior test case in the process of shutting down. Even though we join() on our threads religiously, there is a very short period of time after join() returns where the dying thread is still listed in /proc/self/task/. This meant that we would sometimes fail the assertion that checks that the number of returned stacks matches the number of running threads (because the thread finished exiting just before we called SnapshotAllStacks). In other cases we would still see that exiting thread, but it would exit before responding to our stack trace signal, so we'd see a "timed out" response. The fix is a little hacky - we calculate the expected number of running threads and ASSERT_EVENTUALLY to wait until reality matches expectations. Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Reviewed-on: http://gerrit.cloudera.org:8080/9864 Reviewed-by: Todd LipconTested-by: Todd Lipcon --- M src/kudu/util/debug-util-test.cc 1 file changed, 30 insertions(+), 6 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/9864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Gerrit-Change-Number: 9864 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
Hello Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9864 to look at the new patch set (#3). Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot .. KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot This test was flaky with two different failure modes which I believe have the same root cause: when the test starts, it's possible that there is still some thread from a prior test case in the process of shutting down. Even though we join() on our threads religiously, there is a very short period of time after join() returns where the dying thread is still listed in /proc/self/task/. This meant that we would sometimes fail the assertion that checks that the number of returned stacks matches the number of running threads (because the thread finished exiting just before we called SnapshotAllStacks). In other cases we would still see that exiting thread, but it would exit before responding to our stack trace signal, so we'd see a "timed out" response. The fix is a little hacky - we calculate the expected number of running threads and ASSERT_EVENTUALLY to wait until reality matches expectations. Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 --- M src/kudu/util/debug-util-test.cc 1 file changed, 30 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/9864/3 -- To view, visit http://gerrit.cloudera.org:8080/9864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Gerrit-Change-Number: 9864 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 ) Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17 PS2, Line 17: exitinng > exiting I bet you can't wait for my new laptop to arrive -- To view, visit http://gerrit.cloudera.org:8080/9864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Gerrit-Change-Number: 9864 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:46 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9866 ) Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp .. KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp This fixes the following TSAN race: WARNING: ThreadSanitizer: data race (pid=17822) Read of size 1 at 0x7b4c54e8 by thread T59 (mutexes: write M1750): ... #3 strings::internal::SubstituteArg::SubstituteArg(std::__1::basic_stringconst&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/strings/substitute.h:76 (libtserver.so+0x9edb0) #4 kudu::MaintenanceManager::LogPrefix() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:545:31 (libkudu_util.so+0x167791) #5 kudu::MaintenanceManager::UnregisterOp(kudu::MaintenanceOp*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:235:3 (libkudu_util.so+0x165963) #6 kudu::MaintenanceOp::Unregister() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:123:13 (libkudu_util.so+0x1654fe) #7 kudu::tablet::Tablet::UnregisterMaintenanceOps() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:1405:9 (libtablet.so+0xfb5af) #8 kudu::tablet::TabletReplica::Stop() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:271:25 (libtablet.so+0x146e66) #9 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string c Previous write of size 8 at 0x7b4c54e8 by main thread: #0 memcpy /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655 (kudu-tserver+0x449e4c) #1 std::__1::basic_string ::__move_assign(std::__1::basic_string &, std::__1::integral_constant ) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2044:18 (libkudu_util.so+0x16664d) #2 std::__1::basic_string ::operator=(std::__1::basic_string &&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2055 (libkudu_util.so+0x16664d) #3 kudu::MaintenanceManager::Init(std::__1::basic_string ) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:169 (libkudu_util.so+0x16664d) ... The race is on the 'server_uuid_' field in the MaintenanceManager. This is set during startup, but was being set _after_ calls such as UnregisterOp could be made as seen above. That means the UnregisterOp call could either see an empty UUID or even crash due to the above race. This simply rejiggers the MaintenanceManager startup to take the UUID in as a constructor parameter instead, and to instantiate the object slightly later during startup. Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16 Reviewed-on: http://gerrit.cloudera.org:8080/9866 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo --- M src/kudu/master/master.cc M src/kudu/tserver/tablet_server.cc M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 5 files changed, 24 insertions(+), 16 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16 Gerrit-Change-Number: 9866 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9866 ) Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc File src/kudu/tserver/tablet_server.cc: http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97 PS1, Line 97: maintenance_manager_.reset(new MaintenanceManager( > I think this should move up above the path handler registration, because th yea, it doesn't start actually accepting connections until you call Start() which happens in ServerBase::Start(), called by TabletServer::Start(), which happens after Init() -- To view, visit http://gerrit.cloudera.org:8080/9866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16 Gerrit-Change-Number: 9866 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:12 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106 PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms); Nit: alphabetical order (at least switch with master_inject_latency_on_tablet_lookups_ms). -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Fri, 30 Mar 2018 15:02:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9866 ) Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc File src/kudu/tserver/tablet_server.cc: http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97 PS1, Line 97: maintenance_manager_.reset(new MaintenanceManager( I think this should move up above the path handler registration, because the MM can be dereferenced by a handler. Unless we're not actually listening on any ports yet, in which case it's fine. -- To view, visit http://gerrit.cloudera.org:8080/9866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16 Gerrit-Change-Number: 9866 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Fri, 30 Mar 2018 14:59:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 ) Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17 PS2, Line 17: exitinng exiting -- To view, visit http://gerrit.cloudera.org:8080/9864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0 Gerrit-Change-Number: 9864 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 14:42:26 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2293 fix cleanup after failed copies
Hello Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9452 to look at the new patch set (#5). Change subject: KUDU-2293 fix cleanup after failed copies .. KUDU-2293 fix cleanup after failed copies Before, when a tablet server failed to tablet copy, Kudu would perform a best-effort cleanup of the partially-copied replica and leave the tablet tombstoned. If this tombstoning were to fail due to disk issues (e.g. out of space), Kudu would allow this and tablet would remain in TABLET_DATA_COPYING both in-memory and on-disk. This would lead to a CHECK failure if another tablet copy were started for the same replica, as the server would attempt to copy over a replica with data already marked TABLET_DATA_COPYING. This behavior arose from trying to balance two invariants: - keep on-disk state consistent with in-memory state when possible - when a tablet copy fails, leave it in as dead of a state as we can (i.e. TABLET_DATA_TOMBSTONED with no transitions in progress) This patch updates the Abort() logic to lean towards the latter invariant: if a tablet copy fails, at least its in-memory state will be set as TABLET_DATA_TOMBSTONED. This may not be true on-disk, but that's okay because either 1) the tablet server will eventually overwrite it via another tablet copy (at which time its data must _not_ be in the TABLET_DATA_COPYING state), or 2) the server will be restarted and the tablet will be tombstoned upon seeing a non-TABLET_DATA_READY state anyway. Additionally, we previously used the tablet copy internal state machine to determine whether to call Abort(). This wasn't always right, since the state machine updates when various stages complete, rather than when various stages start, and w.r.t. cleanup, it's useful to know when we have already begun to update state on disk, and when we have finished. So in addition to the above changes to Abort(), this patch also registers a function to be called by TabletCopyClient's destructor that calls Abort() if cleanup is necessary (i.e. if a tablet copy has started and not completed). A test is added to tablet_copy-itest that tests failures to copy, ensuring that the tablet is left in such an state that further copies are possible without crashing. The test uses EIO injection to fail the copies, but the logic is the same as if full disks were used instead. Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h 4 files changed, 157 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/9452/5 -- To view, visit http://gerrit.cloudera.org:8080/9452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04 Gerrit-Change-Number: 9452 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9783 ) Change subject: KUDU-16 pt 1: add server-side limits on scanners .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol-test.cc@323 PS9, Line 323: boost::none /* max_num_rows_to_return */, true /* pad timestamps */); > if you write this as: /* max_num_rows_to_return= */boost::none then clang-t Done http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@762 PS9, Line 762: max_num_rows_to_return > we can get rid of some branching here by doing this once up front: Done http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@771 PS9, Line 771: (*max_num_rows_to_return - rows_returned) > can also avoid this by updating a 'max_rows' above incrementally Done http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@838 PS9, Line 838: num_rows > what's the purpose of allowing this to be negative in this function? can't Done http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_server-test.cc@2072 PS9, Line 2072: lmits > typo Done http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@485 PS9, Line 485: optional > maybe this can be non-optional and simplify code paths by just setting it t Nice, I like it. http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@502 PS9, Line 502: num_rows_returned_ > do we still need this member if it's redundant with the one inside the scan It's necessary for the scanned rows metrics. AFAICT, scanners (and their internal count) can batches multiple scans, whereas the metrics get updated after each HandleRowBlock() (per each batch's result collector). I'm not sure there's elegant way to refactor this out. -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 07:52:49 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9790 to look at the new patch set (#5). Change subject: KUDU-16 pt 2: add client-side limits on scanners .. KUDU-16 pt 2: add client-side limits on scanners This patch adds a public API to allow the specification of per-client-side-scanner limits on the number of rows returned. Each scanner will maintain a count of the number of rows already read, and adjust the server-side limit upon sending the next scan request. A couple of tests are included to verify that the limits act as expected. I also verified that lowering the limit reduces the number of bytes read on disk (at the granularity of a single scan batch at a time). Note: this patch does not implement the behavior for scan tokens. Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d --- M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/scan_configuration.cc M src/kudu/client/scan_configuration.h M src/kudu/client/scanner-internal.cc M src/kudu/client/scanner-internal.h 7 files changed, 184 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/5 -- To view, visit http://gerrit.cloudera.org:8080/9790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d Gerrit-Change-Number: 9790 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon