[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow This test was flaky in TSAN builds for two reasons: (1) even though taking the stack trace is always pretty fast, symbolization can sometimes take more than a second. If that was the case, the assertion that looked at the diagnostics log content would fail just because the expected log was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test failure rate with 4 stress threads down from about 60% to about 0.3%. (2) with periodic stack dumping enabled, it was possible that the periodic dump was already in progress at the time that the overflow-triggered dump was triggered. In that case, the overflow-triggered dump would be ignored. That was expected behavior, but the test would fail due to not seeing the overflow listed as the "reason". This patch fixes this issue by disabling periodic dumping in the test. In order to disable periodic dumping but leave triggered dumping enabled, I changed the semantics of the configuration slightly. Previously, setting it to '0' would disable all dumping. Now, setting it to '0' disables periodic dumping but still leaves triggered dumps enabled. Setting it to '-1' disables all dumping. With both these changes I was able to loop the test 300 times with 4 stress threads in TSAN mode with no failures. Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Reviewed-on: http://gerrit.cloudera.org:8080/9867 Reviewed-by: Adar DemboTested-by: Todd Lipcon --- M src/kudu/master/master-test.cc M src/kudu/server/diagnostics_log.cc 2 files changed, 26 insertions(+), 10 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has removed a vote on this change. Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 21:06:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:26:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106 PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms); > Nit: alphabetical order (at least switch with master_inject_latency_on_tabl Done -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 30 Mar 2018 16:09:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Hello Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9867 to look at the new patch set (#2). Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow This test was flaky in TSAN builds for two reasons: (1) even though taking the stack trace is always pretty fast, symbolization can sometimes take more than a second. If that was the case, the assertion that looked at the diagnostics log content would fail just because the expected log was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test failure rate with 4 stress threads down from about 60% to about 0.3%. (2) with periodic stack dumping enabled, it was possible that the periodic dump was already in progress at the time that the overflow-triggered dump was triggered. In that case, the overflow-triggered dump would be ignored. That was expected behavior, but the test would fail due to not seeing the overflow listed as the "reason". This patch fixes this issue by disabling periodic dumping in the test. In order to disable periodic dumping but leave triggered dumping enabled, I changed the semantics of the configuration slightly. Previously, setting it to '0' would disable all dumping. Now, setting it to '0' disables periodic dumping but still leaves triggered dumps enabled. Setting it to '-1' disables all dumping. With both these changes I was able to loop the test 300 times with 4 stress threads in TSAN mode with no failures. Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e --- M src/kudu/master/master-test.cc M src/kudu/server/diagnostics_log.cc 2 files changed, 26 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/2 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 ) Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106 PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms); Nit: alphabetical order (at least switch with master_inject_latency_on_tablet_lookups_ms). -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Fri, 30 Mar 2018 15:02:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
Hello Mike Percy, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/9867 to review the following change. Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow .. KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow This test was flaky in TSAN builds for two reasons: (1) even though taking the stack trace is always pretty fast, symbolization can sometimes take more than a second. If that was the case, the assertion that looked at the diagnostics log content would fail just because the expected log was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test failure rate with 4 stress threads down from about 60% to about 0.3%. (2) with periodic stack dumping enabled, it was possible that the periodic dump was already in progress at the time that the overflow-triggered dump was triggered. In that case, the overflow-triggered dump would be ignored. That was expected behavior, but the test would fail due to not seeing the overflow listed as the "reason". This patch fixes this issue by disabling periodic dumping in the test. In order to disable periodic dumping but leave triggered dumping enabled, I changed the semantics of the configuration slightly. Previously, setting it to '0' would disable all dumping. Now, setting it to '0' disables periodic dumping but still leaves triggered dumps enabled. Setting it to '-1' disables all dumping. With both these changes I was able to loop the test 300 times with 4 stress threads in TSAN mode with no failures. Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e --- M src/kudu/master/master-test.cc M src/kudu/server/diagnostics_log.cc 2 files changed, 26 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/1 -- To view, visit http://gerrit.cloudera.org:8080/9867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e Gerrit-Change-Number: 9867 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Mike Percy