[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-17 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..

KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

This test was flaky under stress+ASAN with three cases of failures:

(1) most commonly, the assertion to check that an RPC had timed out in the
queue would run too quickly, just before the actual metric was incremented.
Changing it to ASSERT_EVENTUALLY got rid of most of the failures.

(2) in a few cases, the 100ms sleeps were processed completely before the
later RPC was processed, so the RPC that was supposed to time out in the
queue actually had a chance to run. Switching to 1000ms sleeps got rid
of the remaining failure case.

(3) in a few other cases, the call that was supposed to time out (with a
1ms timeout) timed out before it was even sent to the server. This patch bumps
the timeout to 5ms and retries that until it succeeds with a server-side
timeout.

Ran 5000x with the following command:

./build-support/dist_test.py loop -n 500 build/latest/bin/rpc_stub-test \
  --gtest_filter=\*DontHandle\* --gtest_repeat=10 --stress-cpu-threads=8

and got no failures after this patch, whereas I'd usually get 5-10 failures
prior.

Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Reviewed-on: http://gerrit.cloudera.org:8080/9897
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/rpc/rpc_stub-test.cc
1 file changed, 23 insertions(+), 9 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 17 Apr 2018 17:16:28 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-16 Thread Todd Lipcon (Code Review)
Hello Andrew Wong, Kudu Jenkins, Andrew Wong, Sailesh Mukil,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9897

to look at the new patch set (#3).

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..

KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

This test was flaky under stress+ASAN with three cases of failures:

(1) most commonly, the assertion to check that an RPC had timed out in the
queue would run too quickly, just before the actual metric was incremented.
Changing it to ASSERT_EVENTUALLY got rid of most of the failures.

(2) in a few cases, the 100ms sleeps were processed completely before the
later RPC was processed, so the RPC that was supposed to time out in the
queue actually had a chance to run. Switching to 1000ms sleeps got rid
of the remaining failure case.

(3) in a few other cases, the call that was supposed to time out (with a
1ms timeout) timed out before it was even sent to the server. This patch bumps
the timeout to 5ms and retries that until it succeeds with a server-side
timeout.

Ran 5000x with the following command:

./build-support/dist_test.py loop -n 500 build/latest/bin/rpc_stub-test \
  --gtest_filter=\*DontHandle\* --gtest_repeat=10 --stress-cpu-threads=8

and got no failures after this patch, whereas I'd usually get 5-10 failures
prior.

Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
---
M src/kudu/rpc/rpc_stub-test.cc
1 file changed, 23 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/9897/3
--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-16 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG@9
PS1, Line 9: two
> three?
Done


http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc
File src/kudu/rpc/rpc_stub-test.cc:

http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc@460
PS1, Line 460: 1000
> This value doesn't matter, does it?
nope. commented.


http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc@464
PS1, Line 464: // Since our timeout was short, it's possible in rare 
circumstances
 : // that we time out the RPC on the outbound queue, in which 
case
 : // we won't trigger the desired behavior here.
> Hmm, this seems to be explaining the reasoning for the ASSERT_EVENTUALLY, i
the comment is referring to the check for 'SENT' below. I clarified it.



--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 16 Apr 2018 23:23:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-16 Thread Todd Lipcon (Code Review)
Hello Andrew Wong, Kudu Jenkins, Andrew Wong, Sailesh Mukil,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9897

to look at the new patch set (#2).

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..

KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

This test was flaky under stress+ASAN with three cases of failures:

(1) most commonly, the assertion to check that an RPC had timed out in the
queue would run too quickly, just before the actual metric was incremented.
Changing it to ASSERT_EVENTUALLY got rid of most of the failures.

(2) in a few cases, the 100ms sleeps were processed completely before the
later RPC was processed, so the RPC that was supposed to time out in the
queue actually had a chance to run. Switching to 1000ms sleeps got rid
of the remaining failure case.

(3) in a few other cases, the call that was supposed to time out (with a
1ms timeout) timed out before it was even sent to the server. This patch bumps
the timeout to 5ms and retries that until it succeeds with a server-side
timeout.

Ran 5000x with the following command:

./build-support/dist_test.py loop -n 500 build/latest/bin/rpc_stub-test \
  --gtest_filter=\*DontHandle\* --gtest_repeat=10 --stress-cpu-threads=8

and got no failures after this patch, whereas I'd usually get 5-10 failures
prior.

Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
---
M src/kudu/rpc/rpc_stub-test.cc
1 file changed, 22 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/9897/2
--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-02 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG@17
PS1, Line 17: Switching to 1000ms sleeps
> The total test time is still just 1 second on average so I didn't think it
nvm, I didn't see that you were sending only 'n_worker_threads_' RPCs. If you 
sent more RPCs than service threads, then the test would take longer, but 
that's not the case.



--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 03 Apr 2018 03:24:22 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-02 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG@9
PS1, Line 9: two
three?


http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc
File src/kudu/rpc/rpc_stub-test.cc:

http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc@460
PS1, Line 460: 1000
This value doesn't matter, does it?


http://gerrit.cloudera.org:8080/#/c/9897/1/src/kudu/rpc/rpc_stub-test.cc@464
PS1, Line 464: // Since our timeout was short, it's possible in rare 
circumstances
 : // that we time out the RPC on the outbound queue, in which 
case
 : // we won't trigger the desired behavior here.
Hmm, this seems to be explaining the reasoning for the ASSERT_EVENTUALLY, in 
which case it probably belongs up on L456?



--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 03 Apr 2018 02:40:38 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-02 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG@17
PS1, Line 17: Switching to 1000ms sleeps
> Could this make it a slow test on an average case? And if yes, should we ad
The total test time is still just 1 second on average so I didn't think it was 
necessary to make it "slow"



--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 03 Apr 2018 00:06:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-02 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9897 )

Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9897/1//COMMIT_MSG@17
PS1, Line 17: Switching to 1000ms sleeps
Could this make it a slow test on an average case? And if yes, should we add it 
to the list of slow tests?



--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 02 Apr 2018 23:54:58 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

2018-04-02 Thread Todd Lipcon (Code Review)
Hello Andrew Wong, Sailesh Mukil,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9897

to review the following change.


Change subject: KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls
..

KUDU-2394. Fix flaky RpcStubTest.TestDontHandleTimedOutCalls

This test was flaky under stress+ASAN with two cases of failures:

(1) most commonly, the assertion to check that an RPC had timed out in the
queue would run too quickly, just before the actual metric was incremented.
Changing it to ASSERT_EVENTUALLY got rid of most of the failures.

(2) in a few cases, the 100ms sleeps were processed completely before the
later RPC was processed, so the RPC that was supposed to time out in the
queue actually had a chance to run. Switching to 1000ms sleeps got rid
of the remaining failure case.

(3) in a few other cases, the call that was supposed to time out (with a
1ms timeout) timed out before it was even sent to the server. This patch bumps
the timeout to 5ms and retries that until it succeeds with a server-side
timeout.

Ran 5000x with the following command:

./build-support/dist_test.py loop -n 500 build/latest/bin/rpc_stub-test \
  --gtest_filter=\*DontHandle\* --gtest_repeat=10 --stress-cpu-threads=8

and got no failures after this patch, whereas I'd usually get 5-10 failures
prior.

Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
---
M src/kudu/rpc/rpc_stub-test.cc
1 file changed, 19 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/9897/1
--
To view, visit http://gerrit.cloudera.org:8080/9897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If8cfebb5e9160fb11586abdbba6fc1ae06574a2c
Gerrit-Change-Number: 9897
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Sailesh Mukil