[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload

2018-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11945 )

Change subject: [tests] fix flake in TestRandomHistoryGCWorkload
..

[tests] fix flake in TestRandomHistoryGCWorkload

This patch fixes a flake most prominent in TSAN builds for the
RandomizedTabletHistoryGcITest.TestRandomHistoryGCWorkload scenario
of the tablet_history_gc-itest suite.

Before: dist_test --stress_cpu_threads=16: 256 out of 256 failed
  http://dist-test.cloudera.org/job?job_id=aserbin.1542394875.56611

After:  dist_test --stress_cpu_threads=16:   0 out of 256 failed
  http://dist-test.cloudera.org/job?job_id=aserbin.1542397028.71131

In addition, this patch also contains a minor code clean up of the
tablet_history_gc-itest suite to take advantage of the 'using'
declarations.

Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
Reviewed-on: http://gerrit.cloudera.org:8080/11945
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
1 file changed, 15 insertions(+), 13 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
Gerrit-Change-Number: 11945
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload

2018-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11945 )

Change subject: [tests] fix flake in TestRandomHistoryGCWorkload
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc@519
PS1, Line 519: # if !defined(THREAD_SANITIZER)
 :   OverrideFlagForSlowTests("test_num_rounds",
 :Substitute("$0", 
FLAGS_test_num_rounds * 5));
 : # endif
> So before this change, in TSAN mode, the test would time out due to the hig
Yes -- the workload we use in this scenario and manual compaction/flushing 
leads to a situation where too many rowsets are accumulated and it takes longer 
and longer for write operation (especially update) to complete.  Eventually, it 
gets really ugly:

W1116 19:03:13.326668  7226 rpcz_store.cc:253] Call 
kudu.tserver.TabletServerService.Write from 127.0.0.1:46428 (ReqId={client: 
ae42f5eb2c2344068da4093cccfe156c, seq_no=38, attempt_no=0}) took 19083 ms (19.1 
s). Client timeout 1 ms (20 s)
W1116 19:03:13.329354  7226 rpcz_store.cc:259] Trace:  
1116 19:02:54.243665 (+ 0us) service_pool.cc:162] Inserting onto call queue
1116 19:02:54.243908 (+   243us) service_pool.cc:221] Handling call
1116 19:03:13.326412 (+19082504us) inbound_call.cc:162] Queueing success 
response
Related trace 'txn':  
1116 19:02:54.251341 (+ 0us) write_transaction.cc:100] PREPARE: Starting
1116 19:02:54.251519 (+   178us) write_transaction.cc:267] Acquiring schema 
lock in shared mode
1116 19:02:54.251572 (+53us) write_transaction.cc:270] Acquired schema lock
1116 19:02:54.251607 (+35us) tablet.cc:437] PREPARE: Decoding operations
1116 19:02:54.282224 (+ 30617us) tablet.cc:459] PREPARE: Acquiring locks for 
626 operations
1116 19:02:54.390295 (+108071us) tablet.cc:463] PREPARE: locks acquired
1116 19:02:54.390316 (+21us) write_transaction.cc:125] PREPARE: finished. 
1116 19:02:54.390479 (+   163us) write_transaction.cc:135] Start()  
1116 19:02:54.398643 (+  8164us) write_transaction.cc:140] Timestamp: P: 
24 usec, L: 179
1116 19:02:54.399618 (+   975us) log.cc:587] Serialized 4555 byte log entry
1116 19:02:54.411726 (+ 12108us) write_transaction.cc:148] APPLY: Starting
1116 19:03:13.305541 (+18893815us) tablet_metrics.cc:371] ProbeStats: 
bloom_lookups=1269,key_file_lookups=1263,delta_file_lookups=3457,mrs_lookups=0
1116 19:03:13.310397 (+  4856us) log.cc:587] Serialized 5027 byte log entry 
1116 19:03:13.311747 (+  1350us) write_transaction.cc:308] Releasing row and 
schema locks
1116 19:03:13.324116 (+ 12369us) write_transaction.cc:276] Released schema lock
1116 19:03:13.326006 (+  1890us) write_transaction.cc:195] FINISH: updating 
metrics



--
To view, visit http://gerrit.cloudera.org:8080/11945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
Gerrit-Change-Number: 11945
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 16 Nov 2018 21:57:04 +
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload

2018-11-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11945 )

Change subject: [tests] fix flake in TestRandomHistoryGCWorkload
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11945/1/src/kudu/integration-tests/tablet_history_gc-itest.cc@519
PS1, Line 519: # if !defined(THREAD_SANITIZER)
 :   OverrideFlagForSlowTests("test_num_rounds",
 :Substitute("$0", 
FLAGS_test_num_rounds * 5));
 : # endif
So before this change, in TSAN mode, the test would time out due to the high 
number of rounds?



--
To view, visit http://gerrit.cloudera.org:8080/11945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
Gerrit-Change-Number: 11945
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 16 Nov 2018 21:13:26 +
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix flake in TestRandomHistoryGCWorkload

2018-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11945


Change subject: [tests] fix flake in TestRandomHistoryGCWorkload
..

[tests] fix flake in TestRandomHistoryGCWorkload

This patch fixes a flake most prominent in TSAN builds for the
RandomizedTabletHistoryGcITest.TestRandomHistoryGCWorkload scenario
of the tablet_history_gc-itest suite.

Before: dist_test --stress_cpu_threads=16: 256 out of 256 failed
  http://dist-test.cloudera.org/job?job_id=aserbin.1542394875.56611

After:  dist_test --stress_cpu_threads=16:   0 out of 256 failed
  http://dist-test.cloudera.org/job?job_id=aserbin.1542397028.71131

In addition, this patch also contains a minor code clean up of the
tablet_history_gc-itest suite to take advantage of the 'using'
declarations.

Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
1 file changed, 15 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/11945/1
--
To view, visit http://gerrit.cloudera.org:8080/11945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8d146d4b83c8d488d3c1766a53fe4a4d322b6590
Gerrit-Change-Number: 11945
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin