[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has abandoned this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Alexey Serbin has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5305/4/src/kudu/integration-tests/consistency-itest.cc File src/kudu/integration-tests/consistency-itest.cc: PS4, Line 639: 3 Interesting, why 1 row is not enough? -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5305 to look at the new patch set (#4). Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. WIP KUDU-1127 Don't hang scanner threads waiting for safe time This makes it so we don't hang scanner threads on snapshot reads when the difference between safe time and a requested timestamp is bigger than the difference between 'now' and the client's deadline. This makes the server return a Status::ServiceUnavailable() with a THROTTLED error back to the client so that it can retry. This allowed to swap linked_list-test to finish with snapshot scans in the present and have it run on dist-test for 500 loops without errors. WIP as this requires a small directed test. Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e --- M src/kudu/consensus/time_manager.cc M src/kudu/integration-tests/consistency-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/linked_list-test.cc M src/kudu/tserver/tablet_service.cc 5 files changed, 153 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5305/4 -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Alexey Serbin has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS3, Line 126: MonoTime safe_time = clock_->PhysicalComponentOfTimestamp(last_safe_ts_copy_); : MonoTime requested_time = clock_->PhysicalComponentOfTimestamp(timestamp); > Instead of the line of conversions like It seems how there is GetPhysicalComponentDifference(), right? http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/integration-tests/linked_list-test-util.h File src/kudu/integration-tests/linked_list-test-util.h: PS3, Line 594: snapshot_timestamp == kSnapshotAtNow This is also kNoSnapshot constant, if I'm not mistaken. What if this method called with kNoSnapshot? PS3, Line 608: snapshot_timestamp != kSnapshotAtNow ditto http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/integration-tests/linked_list-test.cc File src/kudu/integration-tests/linked_list-test.cc: PS3, Line 88: vector common_flags; : : common_flags.push_back("--skip_remove_old_recovery_dir"); : : // Set history retention to one day, so that we don't GC history in this test. : // We rely on verifying "back in time" with snapshot scans. : common_flags.push_back("--tablet_history_max_age_sec=86400"); : common_flags.push_back("--scanner_max_safe_time_wait_msecs=2"); nit: while you at it, consider replacing sequence of push_back() calls with initializers list for 'common_flags'. http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: PS3, Line 1383: ' nit: remove PS3, Line 1384: Status::OK() if it's not This wording and the name of the method look contradictory to me. Is there any typo by chance? PS3, Line 1386: tablet::HistoryGcOpts history_gc_opts = tablet->GetHistoryGcOpts(); Nit: consider moving this into the if (read_mode == READ_AT_SNAPSHOT) {} scope to avoid making this unnecessary call in other modes. PS3, Line 1796: nit: is it worth adding a comma here? PS3, Line 1796: ahm nit: AHM PS3, Line 1809: - MonoDelta::FromMilliseconds(10) Why is it necessary to have this extra-margin? May be, it's worth dropping it? If not, it would be nice to have a comment why it's here. -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has restored this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Restored oops abandoned wrong patch -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: restore Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has abandoned this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Abandoned decided to go another way -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Alexey Serbin has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS3, Line 126: MonoTime safe_time = clock_->PhysicalComponentOfTimestamp(last_safe_ts_copy_); : MonoTime requested_time = clock_->PhysicalComponentOfTimestamp(timestamp); Instead of the line of conversions like (Timestamp, Timestamp) --> (MonoTime, MonoTime) --> MonoDelta and creating MonoTime objects in different domains, why not to implement operator-(const Timestamp&, const Timestamp&) for getting difference of timestamps and then just do MonoDelta diff(MonoDelta::FromMicroseconds(clock.GetPhysicalValueMicros(t1 - t0))) ? -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Alexey Serbin has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/5305/3/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS3, Line 130: requested_time.GetDeltaSince Could you use the operator-(x, y) here: MonoDelta safe_time_diff = requested_time - safe_time; ? PS3, Line 131: deadline.GetDeltaSince ditto -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5305 to look at the new patch set (#2). Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. WIP KUDU-1127 Don't hang scanner threads waiting for safe time This makes it so we don't hang scanner threads on snapshot reads when the difference between safe time and a requested timestamp is bigger than the difference between 'now' and the client's deadline. This makes the server return a Status::ServiceUnavailable() with a THROTTLED error back to the client so that it can retry. This allowed to swap linked_list-test to finish with snapshot scans in the present and have it run on dist-test for 500 loops without errors. WIP as this requires a small directed test. Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e --- M src/kudu/consensus/time_manager.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/linked_list-test.cc M src/kudu/tserver/tablet_service.cc 4 files changed, 158 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5305/2 -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 1: btw I think the solution for that would actually more in the line of making time manager call an "update me" callback that would make the consensus replica try and get an update from the leader regarding safe time. After all theres a 1/3 chance that this would happen on another replica if we just returned and let the client retry. -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 1: (1 comment) Yeah this doesn't try to make it not wait at all, just tries to avoid waiting when it would likely be pointless because safe time is so far in the past that it's probably not getting to get to 'timestamp' before the client's deadline expires. I assume that in most cases the client's deadline would be bigger than the heartbeat interval. http://gerrit.cloudera.org:8080/#/c/5305/1//COMMIT_MSG Commit Message: Line 15: This allowed to swap linked_list-test to finish with snapshot scans > why not merge the test change in, so this goes in with its end-to-end test sure, will do. think it's still worth a unit test or would that be enough? -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Todd Lipcon has posted comments on this change. Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. Patch Set 1: (3 comments) seems like a reasonable heuristic. The only kinda funny thing is that in the normal case, where the safetime moves stepwise every raft heartbeat (eg once every 500ms or once a second), then the heuristic has the opposite effect from desired. In other words, just after the safetime has been updated, we won't reject anything (even though that's precisely the time when the next update is farthest off). Put another way, there are basically two "modes" to worry about. In the lagging/abandoned mode, the current time gets farther and farther ahead of the safetime, and thus it's reasonable to assume "the longer we have been abandoned, the more likely we are to be abandoned for a longer time". In the non-failure mode, it's the opposite "the longer we've been waiting for a heartbeat, the more likely it is that our next heartbeat is about to arrive". I don't know if it's worth trying to adjust for this based on knowledge of the raft heartbeat interval or empirical knowledge of the timing between the (n-2th) safetime update and the (n-1th) update, but maybe worth a note in the code about this weird effect? http://gerrit.cloudera.org:8080/#/c/5305/1//COMMIT_MSG Commit Message: Line 15: This allowed to swap linked_list-test to finish with snapshot scans why not merge the test change in, so this goes in with its end-to-end test coverage? http://gerrit.cloudera.org:8080/#/c/5305/1/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS1, Line 133: LOG(WARNING) probably worth throttling this, otherwise a server that got abandoned might spew these warnings PS1, Line 135: deadline remaining time budget? remaining timeout? -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1127 Don't hang scanner threads waiting for safe time
David Ribeiro Alves has uploaded a new change for review. http://gerrit.cloudera.org:8080/5305 Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time .. WIP KUDU-1127 Don't hang scanner threads waiting for safe time This makes it so we don't hang scanner threads on snapshot reads when the difference between safe time and a requested timestamp is bigger than the difference between 'now' and the client's deadline. This makes the server return a Status::ServiceUnavailable() with a THROTTLED error back to the client so that it can retry. This allowed to swap linked_list-test to finish with snapshot scans in the present and have it run on dist-test for 500 loops without errors. WIP as this requires a small directed test. Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e --- M src/kudu/consensus/time_manager.cc M src/kudu/tserver/tablet_service.cc 2 files changed, 49 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5305/1 -- To view, visit http://gerrit.cloudera.org:8080/5305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic7cd0b0749e715c5d9e665a8e37d0f1c95af574e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves