[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-07-16 Thread Jonathan Creasy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415856#comment-13415856
 ] 

Jonathan Creasy commented on HBASE-5786:


Working on this tonight. I'll just use the MetricsHistogram for now.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
 Fix For: 0.96.0


 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-07-09 Thread Jonathan Creasy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409725#comment-13409725
 ] 

Jonathan Creasy commented on HBASE-5786:


I'm interested in working on a patch for this, it seems like a pretty good 
starter task for getting involved in HBASE development.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-07-09 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409791#comment-13409791
 ] 

Elliott Clark commented on HBASE-5786:
--

@jonathan
This would be a good project to get you started.  I would ignore the 
discussions about the accuracy of our histograms and just use the 
MetricsHistogram for now.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-07-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409801#comment-13409801
 ] 

Zhihong Ted Yu commented on HBASE-5786:
---

MetricsHistogram depends on the following:
{code}
import org.apache.hadoop.metrics.MetricsRecord;
import org.apache.hadoop.metrics.util.MetricsBase;
import org.apache.hadoop.metrics.util.MetricsRegistry;
{code}
which are deprecated in hadoop.

See discussion 'deprecating (old) metrics in favor of metrics2 framework' on 
dev@ list.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-07-09 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409812#comment-13409812
 ] 

Elliott Clark commented on HBASE-5786:
--

Yes Ted it is deprecated.  However right now it's the best that we have. I have 
other jira's that have metrics2 implementations.  However I don't think that 
it's appropriate to expect a first time contributor to make all of those 
changes before adding a smaller fix.  When we move all of our implementation 
over to metrics2 MetricsHistogram will have to be addressed there too.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399454#comment-13399454
 ] 

Elliott Clark commented on HBASE-5786:
--

The library we use takes time decaying samples in buckets.  So yes we lose some 
accuracy in the higher percentiles if that extreme data was a long time ago.  
However for newer data we are more accurate; if the spread of times stay 
constant then we'll be very accurate.  If our data was normally distributed we 
would have less than 5% error (at least from my understanding of  
http://www.research.att.com/people/Cormode_Graham/library/publications/CormodeShkapenyukSrivastavaXu09.pdf)
 on the all of the measures.  For me 5% error upper bound on a metric seems 
good enough.  All of the other methods that I looked at take a lot longer to 
compute, and so I don't think they are worth it.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399486#comment-13399486
 ] 

Andrew Wang commented on HBASE-5786:


I don't think you can assume a normal distribution for latency. I think it 
looks more Zipfian in practice, or maybe bi-modal because of cache misses. 
Also, a 5% error on a 95th percentile is kind of huge; IIUC, that means it's 
actually reporting between the 90th and 100th percentile. [1] by the same 
authors as your link discusses sampling for high-percentiles.

I found [2] which I think is well-suited for our use case, since it can do 
approximate quantiles on a sliding time window. Space and time bounds seems to 
be O(reasonable log factors). Somehow mashing up [2] to use [1] would be most 
optimal, but doing just [2] is probably okay too.

[1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
[2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-22 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399488#comment-13399488
 ] 

Elliott Clark commented on HBASE-5786:
--

You're absolutely right about it being Zipfan, I was just trying to discuss the 
upper bound on error without having to do the math for confidence with 
different distributions.  Seems like another issue should be filed to implement 
different sampling so that we don't de-rail this one too much.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399576#comment-13399576
 ] 

Andrew Wang commented on HBASE-5786:


Opened HBASE-6261 for high-percentile latency estimation, lets take it there.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399053#comment-13399053
 ] 

Andrew Wang commented on HBASE-5786:


A real stats expert can weigh in, but I don't think the current sampling 
methods are well-suited for computing high-percentile latencies. Reservoir 
sampling is fine for computing gross statistics like the mean and stddev, but 
you really want to be biasing your sampling toward the top end for accurate 
95th and 99th percentile estimates.

I unfortunately don't have any solutions yet, but I'm looking into it.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-06-19 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396880#comment-13396880
 ] 

Elliott Clark commented on HBASE-5786:
--

I actually think that most metrics should go toward histograms rather than 
TimeVaryingRate .  Others have mentioned that an average is not really useful 
when hbase is the primary data store of a live website/app.  That use case is 
much more interested in the 75/95/99th percentile.  Once HBASE-6211 is in I can 
take this up.  Maybe with some of the metrics refactor that Lars is talking 
about.

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-04-13 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253831#comment-13253831
 ] 

Todd Lipcon commented on HBASE-5786:


Or even keep a round robin buffer of the last 1000 such operations? It would be 
a miniscule amount of RAM to track this, right?

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Shaneal Manek

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5786) Implement histogram metrics for flush and compaction latencies and sizes.

2012-04-13 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253861#comment-13253861
 ] 

Jonathan Hsieh commented on HBASE-5786:
---


At least for the constant write load testing I'm doing currently, 1000 entries 
would are exhausted in about 8 hours.  

The histogramming currently uses reservoir sampling (with 100 slots) to keep 
metrics over all time.  Forward biasing makes more recent entries favored.

With the sampling method we could keep reasonable metrics for longer periods of 
time (weeks).

 Implement histogram metrics for flush and compaction latencies and sizes.
 -

 Key: HBASE-5786
 URL: https://issues.apache.org/jira/browse/HBASE-5786
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Shaneal Manek

 Average time for region operations doesn't really tell a useful story when 
 that help diagnose anomalous conditions.
 It would be extremely useful to add histogramming metrics similar to 
 HBASE-5533 for region operations like flush, compaction and splitting.  The 
 probably should be forward biased at a much coarser granularity however 
 (maybe decay every day?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira