[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-30 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424990#comment-13424990
 ] 

Brandon Williams commented on CASSANDRA-4038:
-

That's a decent percentage increase, but still 0.001ms/request is pretty 
minuscule.  LGTM, +1.

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-30 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425001#comment-13425001
 ] 

Jonathan Ellis commented on CASSANDRA-4038:
---

How does this affect the math in the original phi accrual failure detector?  Is 
it worth getting Paul to look into that?

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-30 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425006#comment-13425006
 ] 

Brandon Williams commented on CASSANDRA-4038:
-

It doesn't, really.  Instead of using a fixed sample size we use a 
statistically accurate continuous sample.  The math using the value is the same.

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-29 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424597#comment-13424597
 ] 

Brandon Williams commented on CASSANDRA-4038:
-

bq. Yes, I did a few profiling tests and I see ~30 ms degradation in 
receiveTiming

This is micros, right?

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-29 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424621#comment-13424621
 ] 

Pavel Yaskevich commented on CASSANDRA-4038:


No, it's milliseconds, old one runs in ~80 ms for 100,000 inserts and new one 
~109 ms on the same amount.

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-27 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423924#comment-13423924
 ] 

Brandon Williams commented on CASSANDRA-4038:
-

I'm a bit concerned that shoehorning latency timings into a long from a double 
will always yield zero in a healthy gigabit network where the timings are 
generally fractional.  But, there's a good chance in a situation with such 
similar values their weight is irrelevant after CASSANDRA-3722 anyway.

Have you done any profiling to see if this actually is cheaper than the fixed 
window size?  Specifically I'm worried about receiveTiming becoming more 
expensive.

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-27 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423959#comment-13423959
 ] 

Pavel Yaskevich commented on CASSANDRA-4038:


bq. Have you done any profiling to see if this actually is cheaper than the 
fixed window size? Specifically I'm worried about receiveTiming becoming more 
expensive.

Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming 
speed inserting 10 latency records (increased UPDATES_PER_INTERVAL value to 
be fare with the test).

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2

 Attachments: CASSANDRA-4038.patch


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling

2012-07-13 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413662#comment-13413662
 ] 

Pavel Yaskevich commented on CASSANDRA-4038:


I think it's worth pursuing as that would remove the work we are doing now by 
restricting sampling to window size and number of updates in the interval, 
calculating age of each response arrival, as well as improve sampling by moving 
to exponential decay function. There is already implementation available by 
Apache 2.0 License 
https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/yammer/metrics/stats/ExponentiallyDecayingSample.java

 Investigate improving the dynamic snitch with reservoir sampling
 

 Key: CASSANDRA-4038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4038
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.2


 Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat 
 arbitrarily.  A better fit may be something similar to Metric's 
 ExponentiallyDecayingSample, where more recent information is weighted 
 heavier than past information, and reservoir sampling would also be an 
 efficient way of keeping a statistically significant sample rather than 
 refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE 
 amount.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira