[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9295 )

Change subject: [TestWorkload] an option to retry on read timeouts
..


Patch Set 4:

Thanks you Todd and Mike for the reviews.  Yes, I agree it makes sense to 
verify if that was an exposure of some issues on the client side.  I'll take a 
look at that a bit later, maybe in the scope of wrapping up along with 
https://gerrit.cloudera.org/c/7037/


--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 20 Feb 2018 22:29:09 +
Gerrit-HasComments: No


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-20 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9295 )

Change subject: [TestWorkload] an option to retry on read timeouts
..


Patch Set 4: Code-Review+2

agreed w/ Todd use of this may be indicative of scanner retry bugs


--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 20 Feb 2018 22:24:32 +
Gerrit-HasComments: No


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9295 )

Change subject: [TestWorkload] an option to retry on read timeouts
..


Patch Set 3:

> In the context of TestWorkload, it's the same story as with write operations 
> -- if a timeout happens with a write operation, TestWorkload can ignore that 
> instead of crashing and retry.  That's how it's implemented now, right?  So, 
> the idea here is to do the same with timeouts for scan/read operations.

I think in many of the cases where we set write_timeout_allowed it's because 
we've explicitly set a very low timeout and expect timeouts. We want to stress 
cases where writes are hitting servers as they are going through various 
lifecycle transitions and the best way to do that is to always be starting new 
writes rather than waiting on previously completed ones.

Skimming the cases where we set_timeout_allowed right now for writes, it seems 
like most of them are the above case where we are also setting the timeout to 
150ms, 250ms, etc. There are a few where we are not doing that, and I wonder 
whether this is indicative of bugs in our client retry logic and we should be 
working to remove them.

So I guess that's all a long way to say: if we need to add this, let's also add 
(or find an existing) JIRA that we have cases where we aren't properly failing 
over scans in the case of timeouts.


--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Feb 2018 22:53:59 +
Gerrit-HasComments: No


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9295 )

Change subject: [TestWorkload] an option to retry on read timeouts
..


Patch Set 3:

> I'm not 100% following this. Reads should already have internal
 > retries since they're configured to be fault-tolerant, right? so
 > what's the purpose of retrying on a read timeout rather than just
 > setting the timeout to be longer so that the client's internal
 > retries have a longer budget? Is our issue that our internal client
 > doesn't do a good job? Maybe we should fix that, considering we
 > rely on this behavior for fault tolerance on long-running queries.

In the context of TestWorkload, it's the same story as with write operations -- 
if a timeout happens with a write operation, TestWorkload can ignore that 
instead of crashing and retry.  That's how it's implemented now, right?  So, 
the idea here is to do the same with timeouts for scan/read operations.

While running the raft_consensus_stress-itest in TSAN configuration, I noticed 
that scans just time out while waiting for the server to advance its safe time 
(they might be paused or it might be some furious re-election activity).  So, 
the idea is to avoid crashing in this case and to make the TestWorkload 
behaving the same way as with write operations when 'write_timeout_allowed_' is 
set.

Yes, I could set the read timeout to a huge value, but that would mean the test 
would attempt to do just a few scans and will be stuck with that for the rest 
of the time.  However, I think I want it to attempt scans/reads from as many 
servers as possible, thinking that it could trigger some other bugs to manifest 
themselves.

I'm not sure we need to address that from the client's side as is.

Another alternative would be to disable the reader thread for TestWorkload in 
that stress test.


--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Feb 2018 22:12:08 +
Gerrit-HasComments: No


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9295 )

Change subject: [TestWorkload] an option to retry on read timeouts
..


Patch Set 3:

I'm not 100% following this. Reads should already have internal retries since 
they're configured to be fault-tolerant, right? so what's the purpose of 
retrying on a read timeout rather than just setting the timeout to be longer so 
that the client's internal retries have a longer budget? Is our issue that our 
internal client doesn't do a good job? Maybe we should fix that, considering we 
rely on this behavior for fault tolerance on long-running queries.


--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Feb 2018 20:33:15 +
Gerrit-HasComments: No


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-13 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9295

to look at the new patch set (#3).

Change subject: [TestWorkload] an option to retry on read timeouts
..

[TestWorkload] an option to retry on read timeouts

Introduced a new option to retry timeouts on read/scan operations.
In some stress tests (like a test to reproduce KUDU-2274) this
proves to be useful.

Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
---
M src/kudu/integration-tests/client-stress-test.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest-base.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/raft_consensus_election-itest.cc
M src/kudu/integration-tests/stop_tablet-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/tools/kudu-admin-test.cc
14 files changed, 53 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/9295/3
--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [TestWorkload] an option to retry on read timeouts

2018-02-13 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9295

to look at the new patch set (#2).

Change subject: [TestWorkload] an option to retry on read timeouts
..

[TestWorkload] an option to retry on read timeouts

Introduced a new option to retry timeouts on read/scan operations.
In some stress tests (like a test to reproduce KUDU-2274) this
proves to be useful.

Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
---
M src/kudu/integration-tests/client-stress-test.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest-base.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/raft_consensus_election-itest.cc
M src/kudu/integration-tests/stop_tablet-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/tools/kudu-admin-test.cc
14 files changed, 53 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/9295/2
--
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I54a6b81e4ba3a55676f1fc97cd577a5fe545c550
Gerrit-Change-Number: 9295
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins