[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-08 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046758#comment-15046758
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

I deployed it to a cluster and did a quick run using a loader, sample 
distribution given below:

"ResponseSize_min" : 2,
"ResponseSize_max" : 8095877,
"ResponseSize_mean" : 160129.16119263918,
"ResponseSize_median" : 44676.0,
"ResponseSize_75th_percentile" : 44676.0,
"ResponseSize_90th_percentile" : 44676.0,
"ResponseSize_95th_percentile" : 44676.0,
"ResponseSize_99th_percentile" : 8083077.0,
"ResponseSize_SizeRangeCount_0-10" : 5,
"ResponseSize_SizeRangeCount_10-100" : 2,
"ResponseSize_SizeRangeCount_100-1000" : 205,
"ResponseSize_SizeRangeCount_1-10" : 4018,
"ResponseSize_SizeRangeCount_100-1000" : 63,

"RequestSize_min" : 61,
"RequestSize_max" : 1785561,
"RequestSize_mean" : 1779622.1719077567,
"RequestSize_median" : 1785559.0,
"RequestSize_75th_percentile" : 1785559.0,
"RequestSize_90th_percentile" : 1785561.0,
"RequestSize_95th_percentile" : 1785561.0,
"RequestSize_99th_percentile" : 1785561.0,
"RequestSize_SizeRangeCount_10-100" : 3,
"RequestSize_SizeRangeCount_100-1000" : 4,
"RequestSize_SizeRangeCount_100-1000" : 4286,
---
"AppendSize_min" : 301,
"AppendSize_max" : 1606400,
"AppendSize_mean" : 1600143.9756582216,
"AppendSize_median" : 1606400.0,
"AppendSize_75th_percentile" : 1606400.0,
"AppendSize_90th_percentile" : 1606400.0,
"AppendSize_95th_percentile" : 1606400.0,
"AppendSize_99th_percentile" : 1606400.0,
"AppendSize_SizeRangeCount_100-1000" : 8,
"AppendSize_SizeRangeCount_100-1000" : 4018,

"ProcessCallTime_min" : 0,
"ProcessCallTime_max" : 2577,
"ProcessCallTime_mean" : 153.72955974842768,
"ProcessCallTime_median" : 74.5,
"ProcessCallTime_75th_percentile" : 110.0,
"ProcessCallTime_90th_percentile" : 251.499977,
"ProcessCallTime_95th_percentile" : 534.64999,
"ProcessCallTime_99th_percentile" : 1313.23998,
"ProcessCallTime_TimeRangeCount_0-1" : 204,
"ProcessCallTime_TimeRangeCount_1-3" : 3,
"ProcessCallTime_TimeRangeCount_3-10" : 2,
"ProcessCallTime_TimeRangeCount_10-30" : 2,
"ProcessCallTime_TimeRangeCount_30-100" : 2914,
"ProcessCallTime_TimeRangeCount_100-300" : 715,
"ProcessCallTime_TimeRangeCount_300-1000" : 310,
"ProcessCallTime_TimeRangeCount_1000-3000" : 143,
--
"AppendTime_min" : 0,
"AppendTime_max" : 1,
"AppendTime_mean" : 0.00273224043715847,
"AppendTime_median" : 0.0,
"AppendTime_75th_percentile" : 0.0,
"AppendTime_90th_percentile" : 0.0,
"AppendTime_95th_percentile" : 0.0,
"AppendTime_99th_percentile" : 0.0,
"AppendTime_TimeRangeCount_0-1" : 4026
-
"Mutate_min" : 32,
"Mutate_max" : 2575,
"Mutate_mean" : 160.91325655476598,
"Mutate_median" : 75.5,
"Mutate_75th_percentile" : 110.0,
"Mutate_90th_percentile" : 258.5,
"Mutate_95th_percentile" : 558.5499,
"Mutate_99th_percentile" : 1312.17998,
"Mutate_TimeRangeCount_30-100" : 2926,
"Mutate_TimeRangeCount_100-300" : 703,
"Mutate_TimeRangeCount_300-1000" : 309,
"Mutate_TimeRangeCount_1000-3000" : 143,

"SplitTime_min" : 72,
"SplitTime_max" : 11320,
"SplitTime_mean" : 7276.5,
"SplitTime_median" : 8857.0,
"SplitTime_75th_percentile" : 10892.0,
"SplitTime_90th_percentile" : 11320.0,
"SplitTime_95th_percentile" : 11320.0,
"SplitTime_99th_percentile" : 11320.0,
"SplitTime_TimeRangeCount_30-100" : 1,
"SplitTime_TimeRangeCount_3000-1" : 2,
"SplitTime_TimeRangeCount_1-3" : 1




> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That 

[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047239#comment-15047239
 ] 

Lars Hofhansl commented on HBASE-14869:
---

There's one last change I think we should do (Discussed with [~apurtell]):
Instead of making the last range look Get_>60, might be better to have it 
look like this: Get_60-inf
That's easier to understand and should be a bit easier to process. I can make 
that change upon commit.


> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047261#comment-15047261
 ] 

Andrew Purtell commented on HBASE-14869:


+1

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047739#comment-15047739
 ] 

Hadoop QA commented on HBASE-14869:
---

{color:red}-1 overall{color}.  

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16795//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16795//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16795//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16795//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16795//console

This message is automatically generated.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, 14869-v6-0.98.txt, AppendSizeTime.png, 
> Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045376#comment-15045376
 ] 

Andrew Purtell commented on HBASE-14869:


Then if there are no objections or further comment I will commit this today.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045388#comment-15045388
 ] 

Andrew Purtell commented on HBASE-14869:


Hey [~lhofhansl], want to commit this?

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041714#comment-15041714
 ] 

Andrew Purtell commented on HBASE-14869:


Ok, no further concerns here

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040776#comment-15040776
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

For the metrics I am using _SizeRangeCount_ and _TimeRangeCount_ appended to 
each metric so it is easy to identify Range metrics based on these fixed 
patterns that will differentiate it from all other metrics. Also based on 
/Size/ and /Time/ match it will be easy to process the metrics accordingly as 
time or size metric

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039719#comment-15039719
 ] 

Andrew Purtell commented on HBASE-14869:


Thanks Vikas. It would be a shame if we would wish to tweak the naming after 
this is committed, that's all. Not worried about more than that. 

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039738#comment-15039738
 ] 

Lars Hofhansl commented on HBASE-14869:
---

Cool. Metric name's the only open issue. If nobody else chimes in, I'm good 
committing.

Maybe [~vik.karma] can report how hard it was to make sense of these new metric 
in the automated scripts.
In the end any naming is probably fine. The main part I wasn't sure about was 
the "greater than X" naming.
Recall this scheme: "Get_0-1", "Get_1-3", "Get_10-30" , etc, and "Get_>60"

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039678#comment-15039678
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

[~apurtell] thanks for the review. We do not have splunk forwarders for the 
test env but we already have daily automation scripts running on production 
logs extracting operation latencies from periodic hbase metrics dump like 
Mutate_mean, Mutate_95th_percentile. Since this is just addition to the above 
metric list, we can easily get these metrics also using the same script. 
However I have tested this only locally on dev setup but will set this up on a 
full cluster and run some long running and high load tests to check for perf 
impact, cpu usage etc and update the test results. Sounds ok? 
If the naming convention, range values used for these metrics needs to be 
changed, I can do the same based on suggestion and update the patch.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040770#comment-15040770
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

[~lhofhansl] I looked at splunk where we have GC logs indexed with statements 
like below that also include the "greater than" symbol for before GC after GC 

ParNew: 218868K->9270K(235968K), 0.0077550 secs] 255143K->45545K(1520064K)

Ran rex queries to parse it and verified it works fine, it was able to extract 
the proper field so that should be ok

splunk query:
"logline" |  rex "->(?[^(]+)" | table _time to_gc

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037746#comment-15037746
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

the core test failure does not look related it shows the following issue 
"java.net.BindException: Address already in use"
Fixed the lineLengths issue and added unit test in the attached patch

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038333#comment-15038333
 ] 

Andrew Purtell commented on HBASE-14869:


The latest patches look lgtm. 
Are we sure the new output is consumable and useful for the intended purpose 
[~lhofhansl] [~vik.karma] ? Maybe try this in a test environment (for our 
purposes, with Splunk)?

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038186#comment-15038186
 ] 

Hadoop QA commented on HBASE-14869:
---

{color:red}-1 overall{color}.  
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16754//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16754//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16754//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16754//console

This message is automatically generated.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035506#comment-15035506
 ] 

Hadoop QA commented on HBASE-14869:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12774947/14869-v1-2.0.txt
  against master branch at commit aa41232a877d7a8485bc361fd62150d7c094e9a4.
  ATTACHMENT ID: 12774947

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
new checkstyle errors. Check build console for list of new errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+metaSplitTimeHisto = 
metricsRegistry.newTimeHistogram(META_SPLIT_TIME_NAME, META_SPLIT_TIME_DESC);
+metaSplitSizeHisto = 
metricsRegistry.newSizeHistogram(META_SPLIT_SIZE_NAME, META_SPLIT_SIZE_DESC);
+.addCounter(Interns.info(name + "_" + getRangeType() + "_" + prior 
+ "-" + getRange()[i], desc), val);
+  private final long[] ranges = 
{1,3,10,30,100,300,1000,3000,1,3,6,12,30,60};

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.token.TestGenerateDelegationToken

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16728//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16728//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16728//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16728//console

This message is automatically generated.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v3-0.98.txt, 14869-v4-0.98.txt, 
> AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-02 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035567#comment-15035567
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

will check and fix these.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v3-0.98.txt, 14869-v4-0.98.txt, 
> AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-12-01 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035248#comment-15035248
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

Attached patch for main branch also 14869-v1-2.0.txt

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v3-0.98.txt, 14869-v4-0.98.txt, 
> AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-30 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031581#comment-15031581
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

added the patch 14869-v4-0.98.txt containing the changes for both Hadoop1 and 
Hadoop2. Will submit the patch for the master branch soon.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v2-0.98.txt, 14869-v3-0.98.txt, 14869-v4-0.98.txt, AppendSizeTime.png, 
> Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-29 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031292#comment-15031292
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

[~lhofhansl] please review the attached patch (14869-v3-0.98.txt) with subclass 
implementation 

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v2-0.98.txt, 14869-v3-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031322#comment-15031322
 ] 

Lars Hofhansl commented on HBASE-14869:
---

-v3 looks good to me. I wonder whether it's worth to make it more generic. I.e. 
have one class and pass unit and ranges in? Maybe overkill.

Need a matching Hadoop1 change for 0.98 (or we can leave Hadoop1 as is, 
[~apurtell]?), and a version for the master branch.
I'd also like to get input from other folks about whether there any preferences 
for the metrics names ([~davelatham], [~stack], [~enis], any opinions).


> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v2-0.98.txt, 14869-v3-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-28 Thread Vikas Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030429#comment-15030429
 ] 

Vikas Vishwakarma commented on HBASE-14869:
---

[~lhofhansl] please review the attached patch (14869-v2-0.98.txt) once and 
confirm if this approach is ok. I will then make the changes for all the 
metrics and haddop1 and submit the final patch if review is ok

Changes done:
Created separate classes MutableTimeHistogram.java and MutableSizeHistogram.java
Took out common code related to min,mean,max,count stats into MutableStats.java 
leaving snapshot related code for specific implementation 
Added integration for the new metric types in the DynamicMetricsRegistry
At present I have changed only couple of metrics to time/size based histograms 
(APPEND_SIZE,APPEND_TIME in MetricsWALSourceImpl and GET_KEY in 
MetricsRegionServerSourceImpl) -- snapshot attached

Also some metrics are like Get but some have Time/Size postfixed to it like 
AppendTime, AppendSize. Currently I have added a _TimeCount_ / _SizeCount_ 
postfix to the metrics but will probably just change it to _RangeCount_ or 
something like that? 




> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 14869-v2-0.98.txt
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030687#comment-15030687
 ] 

Lars Hofhansl commented on HBASE-14869:
---

Why not subclass MutableHistogram? Would save a bunch of code.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v2-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030186#comment-15030186
 ] 

Lars Hofhansl commented on HBASE-14869:
---

[~vik.karma] offered to finish this up. Thank you, sir!

I think what's left is:
# separate size and time metrics into separate classes (or maybe pass unit and 
ranges in, and have a single class)
# use the time/size metrics at the right spots
# come up with different ranges for the size metrics
# do the same for Hadoop1 (at least in 0.98)
# make sure the reported names for the metric values to use make sense (and are 
easy to use for machine analysis)



> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028244#comment-15028244
 ] 

Lars Hofhansl commented on HBASE-14869:
---

Hmm... Looking at all the metrics, I also see metrics like: ResponseSize, 
RequestSize, AppendSize, etc. Those also get the ranges now, which doesn't make 
sense (the numbers mean bytes, not ms), in the end we'll want the same there 
(byte 90'iles cannot be combined either, but we'd want different ranges there I 
assume).

So I might have to invent a new metric type for this. Something like 
MutableTimeHistogram.


> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026291#comment-15026291
 ] 

Lars Hofhansl commented on HBASE-14869:
---

Could also not report ranges without a hit, since we do have the count, we can 
always deduce the total number of requests we've seen.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
> Attachments: 14869-test-0.98.txt
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-23 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022263#comment-15022263
 ] 

Dave Latham commented on HBASE-14869:
-

We have 1 cluster where we keep everything in block cache and care about the 
low latencies.  Other scans with heavy filters can take a long time.  I'd 
suggest something like these roughly log_3 bands in milliseconds:
0-1
1-3
3-10
10-30
30-100
100-300
300-1000
1000-3000
3000-1
1-3
3-10
10-30
30+ (greater than 5 minutes)

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021532#comment-15021532
 ] 

Lars Hofhansl commented on HBASE-14869:
---

So what are good default latency bands? What do people care about?


> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021615#comment-15021615
 ] 

Yu Li commented on HBASE-14869:
---

FWIW, 0-10ms/10-100ms/100-500ms/500-1000ms/>1000ms might be enough?

And what granularity you plan to add the latency bands sir? puts/gets or calls?

Currently it seems we are recording latency of each get within multi 
invocation, but for put we count a multi as a whole (check 
RSRpcServices#doBatchOp), maybe need a uniform semantic here before computing 
bands if in the fine granularity?

Another concern about SLA is that for clients doing batch op, maybe they care 
more about service time (latency) of the whole batch rather than each single 
put/get? Shall we support both fine (single put/get) and rough (call) 
granularity?

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021477#comment-15021477
 ] 

Yu Li commented on HBASE-14869:
---

Yes, found the totalCallTime metrics introduced by HBASE-8725 after double 
check. Our online cluster is still using 0.98.12 so I didn't see it... Agree 
adding the latency bands is enough for first step.

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020953#comment-15020953
 ] 

Yu Li commented on HBASE-14869:
---

{quote}
It would be better to record the number of requests in certainly latency bands 
in addition to what we do now
{quote}
+1, agree it could better show SLA status this way.

OTOH, I think we should take call queue wait time into account rather than 
simply request-handling time, which I believe could better reflect the latency 
client senses. 

> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14869) Better request latency histograms

2015-11-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021078#comment-15021078
 ] 

Lars Hofhansl commented on HBASE-14869:
---

I think we have histograms for all these (put time, get time, call time, etc), 
but the call time does not tell which operation happened.
A first step could just be adding the latency bands to our histograms.


> Better request latency histograms
> -
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)