[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11945 ) Change subject: [tests] fix flake in TestRandomHistoryGCWorkload .. [tests] fix flake in TestRandomHistoryGCWorkload This patch fixes a flake most prominent in TSAN builds for the RandomizedTabletHistoryGcITest.TestRandomHistoryGCWorkload scenario of the tablet_history_gc-itest suite. Before: dist_test --stress_cpu_threads=16: 256 out of 256 failed http://dist-test.cloudera.org/job?job_id=aserbin.1542394875.56611 After: dist_test --stress_cpu_threads=16: 0 out of 256 failed http://dist-test.cloudera.org/job?job_id=aserbin.1542397028.71131 In addition, this patch also contains a minor code clean up of the tablet_history_gc-itest suite to take advantage of the 'using' declarations. Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 Reviewed-on: http://gerrit.cloudera.org:8080/11945 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M src/kudu/integration-tests/tablet_history_gc-itest.cc 1 file changed, 15 insertions(+), 13 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 Gerrit-Change-Number: 11945 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11945 ) Change subject: [tests] fix flake in TestRandomHistoryGCWorkload .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc File src/kudu/integration-tests/tablet_history_gc-itest.cc: http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc@519 PS1, Line 519: # if !defined(THREAD_SANITIZER) : OverrideFlagForSlowTests("test_num_rounds", :Substitute("$0", FLAGS_test_num_rounds * 5)); : # endif > So before this change, in TSAN mode, the test would time out due to the hig Yes -- the workload we use in this scenario and manual compaction/flushing leads to a situation where too many rowsets are accumulated and it takes longer and longer for write operation (especially update) to complete. Eventually, it gets really ugly: W1116 19:03:13.326668 7226 rpcz_store.cc:253] Call kudu.tserver.TabletServerService.Write from 127.0.0.1:46428 (ReqId={client: ae42f5eb2c2344068da4093cccfe156c, seq_no=38, attempt_no=0}) took 19083 ms (19.1 s). Client timeout 1 ms (20 s) W1116 19:03:13.329354 7226 rpcz_store.cc:259] Trace: 1116 19:02:54.243665 (+ 0us) service_pool.cc:162] Inserting onto call queue 1116 19:02:54.243908 (+ 243us) service_pool.cc:221] Handling call 1116 19:03:13.326412 (+19082504us) inbound_call.cc:162] Queueing success response Related trace 'txn': 1116 19:02:54.251341 (+ 0us) write_transaction.cc:100] PREPARE: Starting 1116 19:02:54.251519 (+ 178us) write_transaction.cc:267] Acquiring schema lock in shared mode 1116 19:02:54.251572 (+53us) write_transaction.cc:270] Acquired schema lock 1116 19:02:54.251607 (+35us) tablet.cc:437] PREPARE: Decoding operations 1116 19:02:54.282224 (+ 30617us) tablet.cc:459] PREPARE: Acquiring locks for 626 operations 1116 19:02:54.390295 (+108071us) tablet.cc:463] PREPARE: locks acquired 1116 19:02:54.390316 (+21us) write_transaction.cc:125] PREPARE: finished. 1116 19:02:54.390479 (+ 163us) write_transaction.cc:135] Start() 1116 19:02:54.398643 (+ 8164us) write_transaction.cc:140] Timestamp: P: 24 usec, L: 179 1116 19:02:54.399618 (+ 975us) log.cc:587] Serialized 4555 byte log entry 1116 19:02:54.411726 (+ 12108us) write_transaction.cc:148] APPLY: Starting 1116 19:03:13.305541 (+18893815us) tablet_metrics.cc:371] ProbeStats: bloom_lookups=1269,key_file_lookups=1263,delta_file_lookups=3457,mrs_lookups=0 1116 19:03:13.310397 (+ 4856us) log.cc:587] Serialized 5027 byte log entry 1116 19:03:13.311747 (+ 1350us) write_transaction.cc:308] Releasing row and schema locks 1116 19:03:13.324116 (+ 12369us) write_transaction.cc:276] Released schema lock 1116 19:03:13.326006 (+ 1890us) write_transaction.cc:195] FINISH: updating metrics -- To view, visit http://gerrit.cloudera.org:8080/11945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 Gerrit-Change-Number: 11945 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 16 Nov 2018 21:57:04 + Gerrit-HasComments: Yes
[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11945 ) Change subject: [tests] fix flake in TestRandomHistoryGCWorkload .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc File src/kudu/integration-tests/tablet_history_gc-itest.cc: http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc@519 PS1, Line 519: # if !defined(THREAD_SANITIZER) : OverrideFlagForSlowTests("test_num_rounds", :Substitute("$0", FLAGS_test_num_rounds * 5)); : # endif So before this change, in TSAN mode, the test would time out due to the high number of rounds? -- To view, visit http://gerrit.cloudera.org:8080/11945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 Gerrit-Change-Number: 11945 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 16 Nov 2018 21:13:26 + Gerrit-HasComments: Yes
[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11945 Change subject: [tests] fix flake in TestRandomHistoryGCWorkload .. [tests] fix flake in TestRandomHistoryGCWorkload This patch fixes a flake most prominent in TSAN builds for the RandomizedTabletHistoryGcITest.TestRandomHistoryGCWorkload scenario of the tablet_history_gc-itest suite. Before: dist_test --stress_cpu_threads=16: 256 out of 256 failed http://dist-test.cloudera.org/job?job_id=aserbin.1542394875.56611 After: dist_test --stress_cpu_threads=16: 0 out of 256 failed http://dist-test.cloudera.org/job?job_id=aserbin.1542397028.71131 In addition, this patch also contains a minor code clean up of the tablet_history_gc-itest suite to take advantage of the 'using' declarations. Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 --- M src/kudu/integration-tests/tablet_history_gc-itest.cc 1 file changed, 15 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/11945/1 -- To view, visit http://gerrit.cloudera.org:8080/11945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590 Gerrit-Change-Number: 11945 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin