[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift
[ https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] C. Scott Andreas updated CASSANDRA-8732: Component/s: Streaming and Messaging > Make inter-node timeouts tolerate clock skew and drift > -- > > Key: CASSANDRA-8732 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8732 > Project: Cassandra > Issue Type: Improvement > Components: Streaming and Messaging >Reporter: Ariel Weisberg >Priority: Major > Attachments: maximalskew.png > > > Right now internode timeouts rely on currentTimeMillis() (and NTP) to make > sure that tasks don't expire before they arrive. > Every receiver needs to deduce the offset between its nanoTime and the remote > nanoTime. I don't think currentTimeMillis is a good choice because it is > designed to be manipulated by operators and NTP. I would probably be > comfortable assuming that nanoTime isn't going to move in significant ways > without something that could be classified as operator error happening. > I suspect the one timing method you can rely on being accurate is nanoTime > within a node (on average) and that a node can report on its own scheduling > jitter (on average). > Finding the offset requires knowing what the network latency is in one > direction. > One way to do this would be to periodically send a ping request which > generates a series of ping responses at fixed intervals (maybe by UDP?). The > responses should corrected for scheduling jitter since the fixed intervals > may not be exactly achieved by the sender. By measuring the time deviation > between ping responses and their expected arrival time (based on the > interval) and correcting for the remotely reported scheduling jitter, you > should be able to measure latency in one direction. > A weighted moving average (only correct for drift, not readjustment) of these > measurements would eventually converge on a close answer and would not be > impacted by outlier measurements. It may also make sense to drop the largest > N samples to improve accuracy. > One you know network latency you can add that to the timestamp of each ping > and compare to the local clock and know what the offset is. > These measurements won't calculate the offset to be too small (timeouts fire > early), but could calculate the offset to be too large (timeouts fire late). > The conditions where you the offset won't be accurate are the conditions > where you also want them firing reliably. This and bootstrapping in bad > conditions is what I am most uncertain of. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift
[ https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8732: Attachment: maximalskew.png > Make inter-node timeouts tolerate clock skew and drift > -- > > Key: CASSANDRA-8732 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8732 > Project: Cassandra > Issue Type: Improvement >Reporter: Ariel Weisberg > Attachments: maximalskew.png > > > Right now internode timeouts rely on currentTimeMillis() (and NTP) to make > sure that tasks don't expire before they arrive. > Every receiver needs to deduce the offset between its nanoTime and the remote > nanoTime. I don't think currentTimeMillis is a good choice because it is > designed to be manipulated by operators and NTP. I would probably be > comfortable assuming that nanoTime isn't going to move in significant ways > without something that could be classified as operator error happening. > I suspect the one timing method you can rely on being accurate is nanoTime > within a node (on average) and that a node can report on its own scheduling > jitter (on average). > Finding the offset requires knowing what the network latency is in one > direction. > One way to do this would be to periodically send a ping request which > generates a series of ping responses at fixed intervals (maybe by UDP?). The > responses should corrected for scheduling jitter since the fixed intervals > may not be exactly achieved by the sender. By measuring the time deviation > between ping responses and their expected arrival time (based on the > interval) and correcting for the remotely reported scheduling jitter, you > should be able to measure latency in one direction. > A weighted moving average (only correct for drift, not readjustment) of these > measurements would eventually converge on a close answer and would not be > impacted by outlier measurements. It may also make sense to drop the largest > N samples to improve accuracy. > One you know network latency you can add that to the timestamp of each ping > and compare to the local clock and know what the offset is. > These measurements won't calculate the offset to be too small (timeouts fire > early), but could calculate the offset to be too large (timeouts fire late). > The conditions where you the offset won't be accurate are the conditions > where you also want them firing reliably. This and bootstrapping in bad > conditions is what I am most uncertain of. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift
[ https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8732: -- Summary: Make inter-node timeouts tolerate clock skew and drift (was: Make inter-node timeouts tolerate time skew) > Make inter-node timeouts tolerate clock skew and drift > -- > > Key: CASSANDRA-8732 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8732 > Project: Cassandra > Issue Type: Improvement >Reporter: Ariel Weisberg > > Right now internode timeouts rely on currentTimeMillis() (and NTP) to make > sure that tasks don't expire before they arrive. > Every receiver needs to deduce the offset between its nanoTime and the remote > nanoTime. I don't think currentTimeMillis is a good choice because it is > designed to be manipulated by operators and NTP. I would probably be > comfortable assuming that nanoTime isn't going to move in significant ways > without something that could be classified as operator error happening. > I suspect the one timing method you can rely on being accurate is nanoTime > within a node (on average) and that a node can report on its own scheduling > jitter (on average). > Finding the offset requires knowing what the network latency is in one > direction. > One way to do this would be to periodically send a ping request which > generates a series of ping responses at fixed intervals (maybe by UDP?). The > responses should corrected for scheduling jitter since the fixed intervals > may not be exactly achieved by the sender. By measuring the time deviation > between ping responses and their expected arrival time (based on the > interval) and correcting for the remotely reported scheduling jitter, you > should be able to measure latency in one direction. > A weighted moving average (only correct for drift, not readjustment) of these > measurements would eventually converge on a close answer and would not be > impacted by outlier measurements. It may also make sense to drop the largest > N samples to improve accuracy. > One you know network latency you can add that to the timestamp of each ping > and compare to the local clock and know what the offset is. > These measurements won't calculate the offset to be too small (timeouts fire > early), but could calculate the offset to be too large (timeouts fire late). > The conditions where you the offset won't be accurate are the conditions > where you also want them firing reliably. This and bootstrapping in bad > conditions is what I am most uncertain of. -- This message was sent by Atlassian JIRA (v6.3.4#6332)