[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift

2018-11-18 Thread C. Scott Andreas (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-8732:

Component/s: Streaming and Messaging

> Make inter-node timeouts tolerate clock skew and drift
> --
>
> Key: CASSANDRA-8732
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8732
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Ariel Weisberg
>Priority: Major
> Attachments: maximalskew.png
>
>
> Right now internode timeouts rely on currentTimeMillis() (and NTP) to make 
> sure that tasks don't expire before they arrive.
> Every receiver needs to deduce the offset between its nanoTime and the remote 
> nanoTime. I don't think currentTimeMillis is a good choice because it is 
> designed to be manipulated by operators and NTP. I would probably be 
> comfortable assuming that nanoTime isn't going to move in significant ways 
> without something that could be classified as operator error happening.
> I suspect the one timing method you can rely on being accurate is nanoTime 
> within a node (on average) and that a node can report on its own scheduling 
> jitter (on average).
> Finding the offset requires knowing what the network latency is in one 
> direction.
> One way to do this would be to periodically send a ping request which 
> generates a series of ping responses at fixed intervals (maybe by UDP?). The 
> responses should corrected for scheduling jitter since the fixed intervals 
> may not be exactly achieved by the sender. By measuring the time deviation 
> between ping responses and their expected arrival time (based on the 
> interval) and correcting for the remotely reported scheduling jitter, you 
> should be able to measure latency in one direction.
> A weighted moving average (only correct for drift, not readjustment) of these 
> measurements would eventually converge on a close answer and would not be 
> impacted by outlier measurements. It may also make sense to drop the largest 
> N samples to improve accuracy.
> One you know network latency you can add that to the timestamp of each ping 
> and compare to the local clock and know what the offset is.
> These measurements won't calculate the offset to be too small (timeouts fire 
> early), but could calculate the offset to be too large (timeouts fire late). 
> The conditions where you the offset won't be accurate are the conditions 
> where you also want them firing reliably. This and bootstrapping in bad 
> conditions is what I am most uncertain of.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift

2015-02-18 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8732:

Attachment: maximalskew.png

 Make inter-node timeouts tolerate clock skew and drift
 --

 Key: CASSANDRA-8732
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8732
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
 Attachments: maximalskew.png


 Right now internode timeouts rely on currentTimeMillis() (and NTP) to make 
 sure that tasks don't expire before they arrive.
 Every receiver needs to deduce the offset between its nanoTime and the remote 
 nanoTime. I don't think currentTimeMillis is a good choice because it is 
 designed to be manipulated by operators and NTP. I would probably be 
 comfortable assuming that nanoTime isn't going to move in significant ways 
 without something that could be classified as operator error happening.
 I suspect the one timing method you can rely on being accurate is nanoTime 
 within a node (on average) and that a node can report on its own scheduling 
 jitter (on average).
 Finding the offset requires knowing what the network latency is in one 
 direction.
 One way to do this would be to periodically send a ping request which 
 generates a series of ping responses at fixed intervals (maybe by UDP?). The 
 responses should corrected for scheduling jitter since the fixed intervals 
 may not be exactly achieved by the sender. By measuring the time deviation 
 between ping responses and their expected arrival time (based on the 
 interval) and correcting for the remotely reported scheduling jitter, you 
 should be able to measure latency in one direction.
 A weighted moving average (only correct for drift, not readjustment) of these 
 measurements would eventually converge on a close answer and would not be 
 impacted by outlier measurements. It may also make sense to drop the largest 
 N samples to improve accuracy.
 One you know network latency you can add that to the timestamp of each ping 
 and compare to the local clock and know what the offset is.
 These measurements won't calculate the offset to be too small (timeouts fire 
 early), but could calculate the offset to be too large (timeouts fire late). 
 The conditions where you the offset won't be accurate are the conditions 
 where you also want them firing reliably. This and bootstrapping in bad 
 conditions is what I am most uncertain of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8732) Make inter-node timeouts tolerate clock skew and drift

2015-02-03 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8732:
--
Summary: Make inter-node timeouts tolerate clock skew and drift  (was: Make 
inter-node timeouts tolerate time skew)

 Make inter-node timeouts tolerate clock skew and drift
 --

 Key: CASSANDRA-8732
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8732
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg

 Right now internode timeouts rely on currentTimeMillis() (and NTP) to make 
 sure that tasks don't expire before they arrive.
 Every receiver needs to deduce the offset between its nanoTime and the remote 
 nanoTime. I don't think currentTimeMillis is a good choice because it is 
 designed to be manipulated by operators and NTP. I would probably be 
 comfortable assuming that nanoTime isn't going to move in significant ways 
 without something that could be classified as operator error happening.
 I suspect the one timing method you can rely on being accurate is nanoTime 
 within a node (on average) and that a node can report on its own scheduling 
 jitter (on average).
 Finding the offset requires knowing what the network latency is in one 
 direction.
 One way to do this would be to periodically send a ping request which 
 generates a series of ping responses at fixed intervals (maybe by UDP?). The 
 responses should corrected for scheduling jitter since the fixed intervals 
 may not be exactly achieved by the sender. By measuring the time deviation 
 between ping responses and their expected arrival time (based on the 
 interval) and correcting for the remotely reported scheduling jitter, you 
 should be able to measure latency in one direction.
 A weighted moving average (only correct for drift, not readjustment) of these 
 measurements would eventually converge on a close answer and would not be 
 impacted by outlier measurements. It may also make sense to drop the largest 
 N samples to improve accuracy.
 One you know network latency you can add that to the timestamp of each ping 
 and compare to the local clock and know what the offset is.
 These measurements won't calculate the offset to be too small (timeouts fire 
 early), but could calculate the offset to be too large (timeouts fire late). 
 The conditions where you the offset won't be accurate are the conditions 
 where you also want them firing reliably. This and bootstrapping in bad 
 conditions is what I am most uncertain of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)