[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission

2013-12-14 Thread Peter Haggerty (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848357#comment-13848357
 ] 

Peter Haggerty commented on CASSANDRA-5780:
---

We just ran into this again when a node rebooted and came back up thinking 
everything was fine, but every other node in the ring disagreed. This was 
resolved by our normal manual restart procedure where we stop thrift, gossip, 
flush the node, drain the node then restart cassandra but it definitely caused 
some confusion for nodetool status and nodetool info to report that the 
node was up and a working part of the cluster when in fact it wasn't.

The nodes in this state definitely do *not* make it clear that they are not 
part of the cluster anymore.

 nodetool status and ring report incorrect/stale information after decommission
 --

 Key: CASSANDRA-5780
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5780
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Peter Haggerty
Priority: Trivial
  Labels: lhf, ponies

 Cassandra 1.2.6 ring of 12 instances, each with 256 tokens.
 Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring.
 The 9 instances of cassandra that are in the ring all correctly report 
 nodetool status information for the ring and have the same data.
 After the first node is decommissioned:
 nodetool status on decommissioned-1st reports 11 nodes
 After the second node is decommissioned:
 nodetool status on decommissioned-1st reports 11 nodes
 nodetool status on decommissioned-2nd reports 10 nodes
 After the second node is decommissioned:
 nodetool status on decommissioned-1st reports 11 nodes
 nodetool status on decommissioned-2nd reports 10 nodes
 nodetool status on decommissioned-3rd reports 9 nodes
 The storage load information is similarly stale on the various decommissioned 
 nodes. The nodetool status and ring commands continue to return information 
 as if they were part of a cluster and they appear to return the last 
 information that they saw.
 In contrast the nodetool info command fails with an exception, which isn't 
 ideal but at least indicates that there was a failure rather than returning 
 stale information.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (CASSANDRA-6487) Log WARN on large batch sizes

2013-12-14 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-6487:
-

Assignee: Lyuben Todorov

 Log WARN on large batch sizes
 -

 Key: CASSANDRA-6487
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor

 Large batches on a coordinator can cause a lot of node stress. I propose 
 adding a WARN log entry if batch sizes go beyond a configurable size. This 
 will give more visibility to operators on something that can happen on the 
 developer side. 
 New yaml setting with 5k default.
 {{# Log WARN on any batch size exceeding this value. 5k by default.}}
 {{# Caution should be taken on increasing the size of this threshold as it 
 can lead to node instability.}}
 {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-4288) prevent thrift server from starting before gossip has settled

2013-12-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848372#comment-13848372
 ] 

Jonathan Ellis commented on CASSANDRA-4288:
---

This fails to start in a single-node configuration since no gossip tasks are 
completed.

Other notes:

- Prefer Boolean.getBoolean instead of System.getProperty
- Prefer Uninterruptibles.sleepUninterruptibly to Thread.sleep (no try/catch 
req'd)
- Avoid logging at WARN when everything is working fine (here, minimum of 3 
WARN lines)
- Would like some kind of escape valve: Gossip still busy after N seconds?  
Time to start up anyway.

 prevent thrift server from starting before gossip has settled
 -

 Key: CASSANDRA-4288
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4288
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Chris Burroughs
 Fix For: 2.0.4

 Attachments: CASSANDRA-4288-trunk.txt, j4288-1.2-v1-txt, 
 j4288-1.2-v2-txt


 A serious problem is that there is no co-ordination whatsoever between gossip 
 and the consumers of gossip. In particular, on a large cluster with hundreds 
 of nodes, it takes several seconds for gossip to settle because the gossip 
 stage is CPU bound. This leads to a node starting up and accessing thrift 
 traffic long before it has any clue of what up and down. This leads to 
 client-visible timeouts (for nodes that are down but not identified as such) 
 and UnavailableException (for nodes that are up but not yet identified as 
 such). This is really bad in general, but in particular for clients doing 
 non-idempotent writes (counter increments).
 I was going to fix this as part of more significant re-writing in other 
 tickets having to do with gossip/topology/etc, but that's not going to 
 happen. So, the attached patch is roughly what we're running with in 
 production now to make restarts bearable. The minimum wait time is both for 
 ensuring that gossip has time to start becoming CPU bound if it will be, and 
 the reason it's large is to allow for down nodes to be identified as such in 
 most typical cases with a default phi conviction threshold (untested, we 
 actually ran with a smaller number of 5 seconds minimum, but from past 
 experience I believe 15 seconds is enough).
 The patch is tested on our 1.1 branch. It applies on trunk, and the diff is 
 against trunk, but I have not tested it against trunk.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-6488) Batchlog writes consume unnecessarily large amounts of CPU on vnodes clusters

2013-12-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848379#comment-13848379
 ] 

Jonathan Ellis commented on CASSANDRA-6488:
---

NB: I'm not sure what the changes to candidates/chosenEndpoints do so I've left 
that out for now.

 Batchlog writes consume unnecessarily large amounts of CPU on vnodes clusters
 -

 Key: CASSANDRA-6488
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6488
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Aleksey Yeschenko
 Attachments: 6488-rbranson-patch.txt, 6488-v2.txt, graph (21).png


 The cloneTokenOnlyMap call in StorageProxy.getBatchlogEndpoints causes 
 enormous amounts of CPU to be consumed on clusters with many vnodes. I 
 created a patch to cache this data as a workaround and deployed it to a 
 production cluster with 15,000 tokens. CPU consumption drop to 1/5th. This 
 highlights the overall issues with cloneOnlyTokenMap() calls on vnodes 
 clusters. I'm including the maybe-not-the-best-quality workaround patch to 
 use as a reference, but cloneOnlyTokenMap is a systemic issue and every place 
 it's called should probably be investigated.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (CASSANDRA-6488) Batchlog writes consume unnecessarily large amounts of CPU on vnodes clusters

2013-12-14 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6488:
--

Attachment: 6488-v2.txt

v2 to move the caching logic inside cloneOnlyTokenMap

 Batchlog writes consume unnecessarily large amounts of CPU on vnodes clusters
 -

 Key: CASSANDRA-6488
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6488
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Aleksey Yeschenko
 Attachments: 6488-rbranson-patch.txt, 6488-v2.txt, graph (21).png


 The cloneTokenOnlyMap call in StorageProxy.getBatchlogEndpoints causes 
 enormous amounts of CPU to be consumed on clusters with many vnodes. I 
 created a patch to cache this data as a workaround and deployed it to a 
 production cluster with 15,000 tokens. CPU consumption drop to 1/5th. This 
 highlights the overall issues with cloneOnlyTokenMap() calls on vnodes 
 clusters. I'm including the maybe-not-the-best-quality workaround patch to 
 use as a reference, but cloneOnlyTokenMap is a systemic issue and every place 
 it's called should probably be investigated.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[1/3] git commit: r/m Test.iml

2013-12-14 Thread jbellis
Updated Branches:
  refs/heads/cassandra-2.0 a3796f5f7 - bb09d3c1b
  refs/heads/trunk 14ebfbf7f - 9533b587f


r/m Test.iml


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bb09d3c1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bb09d3c1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bb09d3c1

Branch: refs/heads/cassandra-2.0
Commit: bb09d3c1b9a08ab214c9e034002f5b64f1e0e43f
Parents: a3796f5
Author: Jonathan Ellis jbel...@apache.org
Authored: Sat Dec 14 09:57:02 2013 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Sat Dec 14 09:57:02 2013 -0600

--
 test/Test.iml | 214 -
 1 file changed, 214 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bb09d3c1/test/Test.iml
--
diff --git a/test/Test.iml b/test/Test.iml
deleted file mode 100644
index fca23cc..000
--- a/test/Test.iml
+++ /dev/null
@@ -1,214 +0,0 @@
-?xml version=1.0 encoding=UTF-8?
-module type=JAVA_MODULE version=4
-  component name=NewModuleRootManager inherit-compiler-output=true
-exclude-output /
-content url=file://$MODULE_DIR$
-  sourceFolder url=file://$MODULE_DIR$/unit isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/long isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/conf isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/pig isTestSource=false /
-/content
-orderEntry type=inheritedJdk /
-orderEntry type=sourceFolder forTests=false /
-orderEntry type=module module-name=Git-trunk /
-orderEntry type=module-library scope=RUNTIME
-  library
-CLASSES
-  root url=file://$MODULE_DIR$/conf /
-/CLASSES
-JAVADOC /
-SOURCES /
-  /library
-/orderEntry
-  /component
-  component name=org.twodividedbyzero.idea.findbugs
-option name=_basePreferences
-  map
-entry key=property.analysisEffortLevel value=default /
-entry key=property.analyzeAfterCompile value=false /
-entry key=property.exportAsHtml value=true /
-entry key=property.exportAsXml value=true /
-entry key=property.exportBaseDir value= /
-entry key=property.exportCreateArchiveDir value=false /
-entry key=property.exportOpenBrowser value=true /
-entry key=property.minPriorityToReport value=Medium /
-entry key=property.runAnalysisInBackground value=false /
-entry key=property.showHiddenDetectors value=false /
-entry key=property.toolWindowToFront value=true /
-  /map
-/option
-option name=_detectors
-  map
-entry key=AppendingToAnObjectOutputStream value=true /
-entry key=BCPMethodReturnCheck value=false /
-entry key=BadAppletConstructor value=false /
-entry key=BadResultSetAccess value=true /
-entry key=BadSyntaxForRegularExpression value=true /
-entry key=BadUseOfReturnValue value=true /
-entry key=BadlyOverriddenAdapter value=true /
-entry key=BooleanReturnNull value=true /
-entry key=BuildInterproceduralCallGraph value=false /
-entry key=BuildObligationPolicyDatabase value=true /
-entry key=CallToUnsupportedMethod value=false /
-entry key=CalledMethods value=true /
-entry key=CheckCalls value=false /
-entry key=CheckExpectedWarnings value=false /
-entry key=CheckImmutableAnnotation value=true /
-entry key=CheckTypeQualifiers value=true /
-entry key=CloneIdiom value=true /
-entry key=ComparatorIdiom value=true /
-entry key=ConfusedInheritance value=true /
-entry key=ConfusionBetweenInheritedAndOuterMethod value=true /
-entry key=CrossSiteScripting value=true /
-entry key=DoInsideDoPrivileged value=true /
-entry key=DontCatchIllegalMonitorStateException value=true /
-entry key=DontIgnoreResultOfPutIfAbsent value=true /
-entry key=DontUseEnum value=true /
-entry key=DroppedException value=true /
-entry key=DumbMethodInvocations value=true /
-entry key=DumbMethods value=true /
-entry key=DuplicateBranches value=true /
-entry key=EmptyZipFileEntry value=true /
-entry key=EqStringTest value=false /
-entry key=EqualsOperandShouldHaveClassCompatibleWithThis 
value=true /
-entry key=FieldItemSummary value=true /
-entry key=FinalizerNullsFields value=true /
-entry key=FindBadCast value=false /
-entry key=FindBadCast2 value=true /
-entry key=FindBadEqualsImplementation value=false /
-entry key=FindBadForLoop value=true /
-entry key=FindBugsSummaryStats value=true /
- 

[2/3] git commit: r/m Test.iml

2013-12-14 Thread jbellis
r/m Test.iml


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bb09d3c1
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bb09d3c1
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bb09d3c1

Branch: refs/heads/trunk
Commit: bb09d3c1b9a08ab214c9e034002f5b64f1e0e43f
Parents: a3796f5
Author: Jonathan Ellis jbel...@apache.org
Authored: Sat Dec 14 09:57:02 2013 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Sat Dec 14 09:57:02 2013 -0600

--
 test/Test.iml | 214 -
 1 file changed, 214 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bb09d3c1/test/Test.iml
--
diff --git a/test/Test.iml b/test/Test.iml
deleted file mode 100644
index fca23cc..000
--- a/test/Test.iml
+++ /dev/null
@@ -1,214 +0,0 @@
-?xml version=1.0 encoding=UTF-8?
-module type=JAVA_MODULE version=4
-  component name=NewModuleRootManager inherit-compiler-output=true
-exclude-output /
-content url=file://$MODULE_DIR$
-  sourceFolder url=file://$MODULE_DIR$/unit isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/long isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/conf isTestSource=false /
-  sourceFolder url=file://$MODULE_DIR$/pig isTestSource=false /
-/content
-orderEntry type=inheritedJdk /
-orderEntry type=sourceFolder forTests=false /
-orderEntry type=module module-name=Git-trunk /
-orderEntry type=module-library scope=RUNTIME
-  library
-CLASSES
-  root url=file://$MODULE_DIR$/conf /
-/CLASSES
-JAVADOC /
-SOURCES /
-  /library
-/orderEntry
-  /component
-  component name=org.twodividedbyzero.idea.findbugs
-option name=_basePreferences
-  map
-entry key=property.analysisEffortLevel value=default /
-entry key=property.analyzeAfterCompile value=false /
-entry key=property.exportAsHtml value=true /
-entry key=property.exportAsXml value=true /
-entry key=property.exportBaseDir value= /
-entry key=property.exportCreateArchiveDir value=false /
-entry key=property.exportOpenBrowser value=true /
-entry key=property.minPriorityToReport value=Medium /
-entry key=property.runAnalysisInBackground value=false /
-entry key=property.showHiddenDetectors value=false /
-entry key=property.toolWindowToFront value=true /
-  /map
-/option
-option name=_detectors
-  map
-entry key=AppendingToAnObjectOutputStream value=true /
-entry key=BCPMethodReturnCheck value=false /
-entry key=BadAppletConstructor value=false /
-entry key=BadResultSetAccess value=true /
-entry key=BadSyntaxForRegularExpression value=true /
-entry key=BadUseOfReturnValue value=true /
-entry key=BadlyOverriddenAdapter value=true /
-entry key=BooleanReturnNull value=true /
-entry key=BuildInterproceduralCallGraph value=false /
-entry key=BuildObligationPolicyDatabase value=true /
-entry key=CallToUnsupportedMethod value=false /
-entry key=CalledMethods value=true /
-entry key=CheckCalls value=false /
-entry key=CheckExpectedWarnings value=false /
-entry key=CheckImmutableAnnotation value=true /
-entry key=CheckTypeQualifiers value=true /
-entry key=CloneIdiom value=true /
-entry key=ComparatorIdiom value=true /
-entry key=ConfusedInheritance value=true /
-entry key=ConfusionBetweenInheritedAndOuterMethod value=true /
-entry key=CrossSiteScripting value=true /
-entry key=DoInsideDoPrivileged value=true /
-entry key=DontCatchIllegalMonitorStateException value=true /
-entry key=DontIgnoreResultOfPutIfAbsent value=true /
-entry key=DontUseEnum value=true /
-entry key=DroppedException value=true /
-entry key=DumbMethodInvocations value=true /
-entry key=DumbMethods value=true /
-entry key=DuplicateBranches value=true /
-entry key=EmptyZipFileEntry value=true /
-entry key=EqStringTest value=false /
-entry key=EqualsOperandShouldHaveClassCompatibleWithThis 
value=true /
-entry key=FieldItemSummary value=true /
-entry key=FinalizerNullsFields value=true /
-entry key=FindBadCast value=false /
-entry key=FindBadCast2 value=true /
-entry key=FindBadEqualsImplementation value=false /
-entry key=FindBadForLoop value=true /
-entry key=FindBugsSummaryStats value=true /
-entry key=FindCircularDependencies value=false /
-entry key=FindDeadLocalStores value=true /
-

[3/3] git commit: Merge branch 'cassandra-2.0' into trunk

2013-12-14 Thread jbellis
Merge branch 'cassandra-2.0' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9533b587
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9533b587
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9533b587

Branch: refs/heads/trunk
Commit: 9533b587f0e04b496615ffb884cc2d7530799314
Parents: 14ebfbf bb09d3c
Author: Jonathan Ellis jbel...@apache.org
Authored: Sat Dec 14 09:57:06 2013 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Sat Dec 14 09:57:06 2013 -0600

--
 test/Test.iml | 214 -
 1 file changed, 214 deletions(-)
--




[jira] [Commented] (CASSANDRA-2238) Allow nodetool to print out hostnames given an option

2013-12-14 Thread Daneel S. Yaitskov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848435#comment-13848435
 ] 

Daneel S. Yaitskov commented on CASSANDRA-2238:
---

I've cherry-picked this commit into 1.2 trunk.
You can find it here 
https://github.com/yaitskov/cassandra/tree/nodetool-status-resolve-ip-support-for-1.2

 Allow nodetool to print out hostnames given an option
 -

 Key: CASSANDRA-2238
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2238
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Joaquin Casares
Priority: Trivial

 Give nodetool the option of either displaying IPs or hostnames for the nodes 
 in a ring.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


git commit: remove dead code

2013-12-14 Thread dbrosius
Updated Branches:
  refs/heads/trunk 9533b587f - 5d167cf3d


remove dead code


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5d167cf3
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5d167cf3
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5d167cf3

Branch: refs/heads/trunk
Commit: 5d167cf3df23c728034e43a01e7f5e6561094df4
Parents: 9533b58
Author: Dave Brosius dbros...@mebigfatguy.com
Authored: Sat Dec 14 18:37:13 2013 -0500
Committer: Dave Brosius dbros...@mebigfatguy.com
Committed: Sat Dec 14 18:37:13 2013 -0500

--
 .../org/apache/cassandra/db/compaction/LeveledManifest.java | 5 -
 1 file changed, 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/5d167cf3/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
--
diff --git a/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java 
b/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
index 2ec42e4..4dab156 100644
--- a/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
+++ b/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
@@ -17,8 +17,6 @@
  */
 package org.apache.cassandra.db.compaction;
 
-import java.io.DataOutputStream;
-import java.io.FileOutputStream;
 import java.io.IOException;
 import java.util.*;
 
@@ -38,8 +36,6 @@ import org.apache.cassandra.db.RowPosition;
 import org.apache.cassandra.dht.Bounds;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.io.sstable.*;
-import org.apache.cassandra.io.util.FileUtils;
-import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 public class LeveledManifest
@@ -185,7 +181,6 @@ public class LeveledManifest
 private synchronized void sendBackToL0(SSTableReader sstable)
 {
 remove(sstable);
-String metaDataFile = sstable.descriptor.filenameFor(Component.STATS);
 try
 {
 
sstable.descriptor.getMetadataSerializer().mutateLevel(sstable.descriptor, 0);



[jira] [Commented] (CASSANDRA-6486) Latency Measurement

2013-12-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848498#comment-13848498
 ] 

Benedict commented on CASSANDRA-6486:
-

So, after thinking about this a little more, I may be leaning towards a 
slightly modified approach, that avoids the per-thread allocation and dynamic 
resizing of ranges in favour of a single global reservoir that is updated 
directly by each thread. This has the disadvantage that the intervals you're 
timing are more difficult to define, but we really don't need that kind of 
paranoia with accuracy for measuring many-microsecond and above events. 

I'm currently thinking of using a rolling collection of sample-histograms (say 
10 per timer) to provide a rolling window on the desired measurement interval, 
and on retiring the oldest sample-histogram the result can be merged into the 
next tier of interval we're measuring.

Alternatively we could take the same approach but with just a regular 
histogram, but I currently prefer the sampled approach as, even with larger 
windows than a histogram, the distribution for any window is more likely to 
closely approximate a normal distribution and so should give a more accurate 
picture of latencies for the interval even with a very small sample size.



 Latency Measurement
 ---

 Key: CASSANDRA-6486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6486
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict

 Latency measurement in Cassandra is currently suboptimal. Exactly what the 
 latency measurements tell you isn't intuitively clear due to their 
 exponentially decaying, but amount to some view of the latency per 
 (unweighted) operation over the past, approximately, 10 minute period, with 
 greater weight given to more recent operations. This has some obvious flaws, 
 the most notable being that due to probabilistic sampling, large outlier 
 events (e.g. GC) can easily be lost over a multi-minute time horizon, and 
 even when caught are unlikely to appear even in the 99.9th percentile due to 
 accounting for a tiny fraction of events numerically.
 I'm generally thinking about how we might improve on this, and want to dump 
 my ideas here for discussion. I think the following things should be targeted:
 1) Ability to see uniform latency measurements for different time horizons 
 stretching back from the present, e.g. last 1s, 1m, 1hr and 1day
 2) Ability to bound the error margin of statistics for all of these intervals
 3) Protect against losing outlier measurements
 4) Possibly offer the ability to weight statistics, so that longer latencies 
 are not underplayed even if they are counted
 5) Preferably non-blocking, memory efficient, and relatively garbage-free
 (3) and (4) are the trickiest, as a theoretically sound and general approach 
 isn't immediately obvious. There are a number of possibilities that spring to 
 mind:
 1) ensure that we have enough sample points that we are probabilistically 
 guaranteed to not lose them, but over large time horizons this is problematic 
 due to memory constraints, and it doesn't address (4);
 2) count large events multiple times (or sub-slices of the events), based on 
 e.g. average op-rate. I am not a fan of this idea because it makes possibly 
 bad assumptions about behaviour and doesn't seem very theoretically sound;
 3) weight the probability of retaining an event by its length. the problem 
 with this approach is that it ties you into (4) without offering the current 
 view of statistics (i.e. unweighted operations), and it also doesn't lend 
 itself to efficient implementation
 I'm currently leaning towards a fourth approach, which attempts to hybridise 
 uniform sampling and histogram behaviour, by separating the sample space into 
 ranges, each some multiple of the last (say 2x the size). Each range has a 
 uniform sample of events that occured in that range, plus a count of total 
 events. Ideally the size of the sample will be variable based on the number 
 of events occurring in any range, but that there will be a lower-bound 
 calculated to ensure we do not lose events.
 This approach lends itself to all 5 goals above:
 1) by maintaining the same structure for each time horizon, and uniformly 
 sampling from all of the directly lower order time horizons to maintain it;
 2) by imposing minimum sample sizes for each range;
 3) ditto (2);
 4) by producing time/frequency-weighted statistics using the samples and 
 counts from each range;
 5) with thread-local array-based timers that are synchronised with the global 
 timer once every minimum period, by the owning thread
 This also extends reasonably nicely the timers I have already written for 
 CASSANDRA-6199, so some of the work is already done.
 Thoughts / discussion would be 

[jira] [Updated] (CASSANDRA-6486) Latency Measurement

2013-12-14 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6486:
--

Priority: Minor  (was: Major)
Assignee: (was: Benedict)

Let's put this on the back burner.  Coda metrics represents the industry 
standard and is used by hundreds of projects.  It's not perfect, but it's Good 
Enough.

 Latency Measurement
 ---

 Key: CASSANDRA-6486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6486
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor

 Latency measurement in Cassandra is currently suboptimal. Exactly what the 
 latency measurements tell you isn't intuitively clear due to their 
 exponentially decaying, but amount to some view of the latency per 
 (unweighted) operation over the past, approximately, 10 minute period, with 
 greater weight given to more recent operations. This has some obvious flaws, 
 the most notable being that due to probabilistic sampling, large outlier 
 events (e.g. GC) can easily be lost over a multi-minute time horizon, and 
 even when caught are unlikely to appear even in the 99.9th percentile due to 
 accounting for a tiny fraction of events numerically.
 I'm generally thinking about how we might improve on this, and want to dump 
 my ideas here for discussion. I think the following things should be targeted:
 1) Ability to see uniform latency measurements for different time horizons 
 stretching back from the present, e.g. last 1s, 1m, 1hr and 1day
 2) Ability to bound the error margin of statistics for all of these intervals
 3) Protect against losing outlier measurements
 4) Possibly offer the ability to weight statistics, so that longer latencies 
 are not underplayed even if they are counted
 5) Preferably non-blocking, memory efficient, and relatively garbage-free
 (3) and (4) are the trickiest, as a theoretically sound and general approach 
 isn't immediately obvious. There are a number of possibilities that spring to 
 mind:
 1) ensure that we have enough sample points that we are probabilistically 
 guaranteed to not lose them, but over large time horizons this is problematic 
 due to memory constraints, and it doesn't address (4);
 2) count large events multiple times (or sub-slices of the events), based on 
 e.g. average op-rate. I am not a fan of this idea because it makes possibly 
 bad assumptions about behaviour and doesn't seem very theoretically sound;
 3) weight the probability of retaining an event by its length. the problem 
 with this approach is that it ties you into (4) without offering the current 
 view of statistics (i.e. unweighted operations), and it also doesn't lend 
 itself to efficient implementation
 I'm currently leaning towards a fourth approach, which attempts to hybridise 
 uniform sampling and histogram behaviour, by separating the sample space into 
 ranges, each some multiple of the last (say 2x the size). Each range has a 
 uniform sample of events that occured in that range, plus a count of total 
 events. Ideally the size of the sample will be variable based on the number 
 of events occurring in any range, but that there will be a lower-bound 
 calculated to ensure we do not lose events.
 This approach lends itself to all 5 goals above:
 1) by maintaining the same structure for each time horizon, and uniformly 
 sampling from all of the directly lower order time horizons to maintain it;
 2) by imposing minimum sample sizes for each range;
 3) ditto (2);
 4) by producing time/frequency-weighted statistics using the samples and 
 counts from each range;
 5) with thread-local array-based timers that are synchronised with the global 
 timer once every minimum period, by the owning thread
 This also extends reasonably nicely the timers I have already written for 
 CASSANDRA-6199, so some of the work is already done.
 Thoughts / discussion would be welcome, especially if you think I've missed 
 another obvious approach.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-6486) Latency Measurement

2013-12-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848518#comment-13848518
 ] 

Benedict commented on CASSANDRA-6486:
-

Sure. I'm not actually working on it, just considering options, and only 
intended to attack this on th side. This can actually all be done in codahale 
with a custom reservoir, and I think now I've got a handle on what I'll do, the 
 implementation of that reservoir should actually be very easy. So no doubt it 
will start bugging me soon and I'll implement it in my down time sometime 
before the new year.

I do think our current use of codahale is very bad at reporting latency spikes, 
which is more of a problem for our project than for others, given how we 
advertise real time characteristics.

 Latency Measurement
 ---

 Key: CASSANDRA-6486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6486
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor

 Latency measurement in Cassandra is currently suboptimal. Exactly what the 
 latency measurements tell you isn't intuitively clear due to their 
 exponentially decaying, but amount to some view of the latency per 
 (unweighted) operation over the past, approximately, 10 minute period, with 
 greater weight given to more recent operations. This has some obvious flaws, 
 the most notable being that due to probabilistic sampling, large outlier 
 events (e.g. GC) can easily be lost over a multi-minute time horizon, and 
 even when caught are unlikely to appear even in the 99.9th percentile due to 
 accounting for a tiny fraction of events numerically.
 I'm generally thinking about how we might improve on this, and want to dump 
 my ideas here for discussion. I think the following things should be targeted:
 1) Ability to see uniform latency measurements for different time horizons 
 stretching back from the present, e.g. last 1s, 1m, 1hr and 1day
 2) Ability to bound the error margin of statistics for all of these intervals
 3) Protect against losing outlier measurements
 4) Possibly offer the ability to weight statistics, so that longer latencies 
 are not underplayed even if they are counted
 5) Preferably non-blocking, memory efficient, and relatively garbage-free
 (3) and (4) are the trickiest, as a theoretically sound and general approach 
 isn't immediately obvious. There are a number of possibilities that spring to 
 mind:
 1) ensure that we have enough sample points that we are probabilistically 
 guaranteed to not lose them, but over large time horizons this is problematic 
 due to memory constraints, and it doesn't address (4);
 2) count large events multiple times (or sub-slices of the events), based on 
 e.g. average op-rate. I am not a fan of this idea because it makes possibly 
 bad assumptions about behaviour and doesn't seem very theoretically sound;
 3) weight the probability of retaining an event by its length. the problem 
 with this approach is that it ties you into (4) without offering the current 
 view of statistics (i.e. unweighted operations), and it also doesn't lend 
 itself to efficient implementation
 I'm currently leaning towards a fourth approach, which attempts to hybridise 
 uniform sampling and histogram behaviour, by separating the sample space into 
 ranges, each some multiple of the last (say 2x the size). Each range has a 
 uniform sample of events that occured in that range, plus a count of total 
 events. Ideally the size of the sample will be variable based on the number 
 of events occurring in any range, but that there will be a lower-bound 
 calculated to ensure we do not lose events.
 This approach lends itself to all 5 goals above:
 1) by maintaining the same structure for each time horizon, and uniformly 
 sampling from all of the directly lower order time horizons to maintain it;
 2) by imposing minimum sample sizes for each range;
 3) ditto (2);
 4) by producing time/frequency-weighted statistics using the samples and 
 counts from each range;
 5) with thread-local array-based timers that are synchronised with the global 
 timer once every minimum period, by the owning thread
 This also extends reasonably nicely the timers I have already written for 
 CASSANDRA-6199, so some of the work is already done.
 Thoughts / discussion would be welcome, especially if you think I've missed 
 another obvious approach.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)