[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-11-15 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667389#comment-15667389
 ] 

Paulo Motta commented on CASSANDRA-12643:
-

bq. I think if we just show the values it would be basically easy to determine 
why the algorithm overflows. The name is fairly meaningless. If the values are 
0,0,0 and that overflows we can feed that into a unit test and reason about 
what it is supposed to do.

The values alone are fairly meaningless without the metric name, since a 
histogram is just a container of values, so an overflowed histogram will 
typically indicate a problem with the metric, and not with the histogram 
implementation.

While I think the correct would be for the Histogram accessor to print the 
overflowed histogram in case of exceptions while retrieving values and not the 
histogram implementation itself, I understand that this would require reporters 
to be modified which is not always easy so I think it's acceptable to log the 
histogram values when fetching provided that the user enables this for 
troubleshooting. So, in order to not pollute {{debug.log}} we should log this 
at {{TRACE}} so the operator enables this when he wants to troubleshoot 
problems with overflowing histograms.

I think we should also take this chance to improve the 
{{IllegalStateException}} message to include the metric name, so users will 
know what metric is overflowed when getting the exception and possibly enable 
{{TRACE}} on the {{org.apache.cassandra.metrics}} if he wants to further 
investigate the problem. For this, we probably need to include the metric name 
in histogram constructor.

Another thing is that CASSANDRA-11752 changed the 
{{EstimatedHistogramReservoir}} to {{DecayingEstimatedHistogramReservoir}}, so 
I think you will want to work on this class instead.

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-11-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658633#comment-15658633
 ] 

Edward Capriolo commented on CASSANDRA-12643:
-

{quote}
I think the right fix should actually be on LibratoReporter to print which 
metric is overflowed when getting an exception
{quote}
That would be the right fix but that is not the easy fix. Unfortunately, the 
abstraction provided by reporters only provide you a reportGauge(). You can not 
redefine that functionality without changing the metric-reporting classes.

That is really the big problem the code looks like this: (and is not code 
defined in apache-cassandra)

{noformat}
report(){
for(Gauge g: gauges){
  reportGauge(g);
}
for(Counter c: counters){
  reportCounter(c);
}
{noformat}

Basically most reporters do NOT even try catch so throwing any exception 
generally just causes the reporter to fail and result in only some things 
getting counter. Basically nothing anywhere exceptions any metric to throw any 
exception or assertion.

I agree it would be idea to see name/ values. But its not easy. 

I think if we just show the values it would be basically easy to determine why 
the algorithm overflows. The name is fairly meaningless. If the values are 
0,0,0 we can feed that into a unit test and reason about what it is supposed to 
do.

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-11-11 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658234#comment-15658234
 ] 

Paulo Motta commented on CASSANDRA-12643:
-

I'm not sure how helpful this will be without the metric name, since you get 
the histogram values but not know which metric it refers to. I think the right 
fix should actually be on {{LibratoReporter}} to print which metric is 
overflowed when getting an exception, in addition to the histogram values. With 
that said, I think we should include the histogram buckets in the 
{{IllegalStateException}} message, so it is logged in the right context 
possibly indicating the metric name, rather than blindly logging it without 
context in the {{EstimatedHistogram}} class. WDYT?

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-10-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575478#comment-15575478
 ] 

Edward Capriolo commented on CASSANDRA-12643:
-

This patch would help administrators understand what overflows and why. It can 
be used to refine the algorithm to fail less. Comments appreciated.

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-09-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494465#comment-15494465
 ] 

Edward Capriolo commented on CASSANDRA-12643:
-

Also I wanted to point out something. The EstimatedHistogram is used for 
non-reporting cases. If you do a usage search there are some internal 
structures that are sized based on this. While in practice it may not be a 
problem, I am struggling with an "estimator" throw throws a RuntimeException. 
Afterall it is an estimate. None of the things that call it check and do 
anything for this exception. Theoretically this could cause a process to never 
complete. Thinking about two methods. One that always returns data subclasses 
do not bubble up into reporter. Possibly a second with a check/unchecked 
exception so that things calling it can have a fall back logic. 

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-09-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491695#comment-15491695
 ] 

Edward Capriolo commented on CASSANDRA-12643:
-

https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:CASSANDRA-12643

It would be nice to know which data it was, but at least seeing the data should 
help us understand this.

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12643) Estimated histograms tend to overflow

2016-09-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491600#comment-15491600
 ] 

Edward Capriolo commented on CASSANDRA-12643:
-

After enabling a reporter and starting up Cassandra I have observed the 
following stack trace.

{noformat}
java.lang.IllegalStateException: Unable to compute when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.percentile(EstimatedHistogram.java:198)
 ~apache-cassandra-3.0.8.jar:3.0.8
at 
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getValue(EstimatedHistogramReservoir.java:85)
 ~apache-cassandra-3.0.8.jar:3.0.8
at com.codahale.metrics.Snapshot.getMedian(Snapshot.java:38) 
~metrics-core-3.1.0.jar:3.1.0
at 
com.librato.metrics.MetricsLibratoBatch.addSampling(MetricsLibratoBatch.java:144)
 ~metrics-librato-4.1.2.5.jar:na
at 
com.librato.metrics.MetricsLibratoBatch.addHistogram(MetricsLibratoBatch.java:124)
 ~metrics-librato-4.1.2.5.jar:na
at com.librato.metrics.LibratoReporter.report(LibratoReporter.java:167) 
~metrics-librato-4.1.2.5.jar:na
at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~metrics-core-3.1.0.jar:3.1.0
at com.librato.metrics.LibratoReporter.report(LibratoReporter.java:127) 
~metrics-librato-4.1.2.5.jar:na
at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
metrics-core-3.1.0.jar:3.1.0
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
na:1.8.0_101
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) na:1.8.0_101
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 na:1.8.0_101
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 na:1.8.0_101
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
na:1.8.0_101
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
na:1.8.0_101
at java.lang.Thread.run(Thread.java:745) na:1.8.0_101
{noformat}

According to the inline documentation the largest bucket should accommodate 36 
seconds. The server only being alive for a few seconds it seems unlikely that 
these can be overflowed. 

This overflow bubbles up to the report and this prevents data from being 
exported. I poked around and do not understand why a fresh server would 
overflow. I found some nits in the code I can fix and maybe someone with more 
brain power can chime in.

> Estimated histograms tend to overflow
> -
>
> Key: CASSANDRA-12643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12643
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)