Todd Lipcon has posted comments on this change.
Change subject: WIP KUDU-1127 Don't hang scanner threads waiting for safe time
Patch Set 1:
seems like a reasonable heuristic.
The only kinda funny thing is that in the normal case, where the safetime moves
stepwise every raft heartbeat (eg once every 500ms or once a second), then the
heuristic has the opposite effect from desired. In other words, just after the
safetime has been updated, we won't reject anything (even though that's
precisely the time when the next update is farthest off).
Put another way, there are basically two "modes" to worry about. In the
lagging/abandoned mode, the current time gets farther and farther ahead of the
safetime, and thus it's reasonable to assume "the longer we have been
abandoned, the more likely we are to be abandoned for a longer time". In the
non-failure mode, it's the opposite "the longer we've been waiting for a
heartbeat, the more likely it is that our next heartbeat is about to arrive".
I don't know if it's worth trying to adjust for this based on knowledge of the
raft heartbeat interval or empirical knowledge of the timing between the
(n-2th) safetime update and the (n-1th) update, but maybe worth a note in the
code about this weird effect?
Line 15: This allowed to swap linked_list-test to finish with snapshot scans
why not merge the test change in, so this goes in with its end-to-end test
PS1, Line 133: LOG(WARNING)
probably worth throttling this, otherwise a server that got abandoned might
spew these warnings
PS1, Line 135: deadline
remaining time budget? remaining timeout?
To view, visit http://gerrit.cloudera.org:8080/5305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>