[jira] [Created] (HBASE-20943) Add offline/online region count into metrics

2018-07-25 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-20943:
--

 Summary: Add offline/online region count into metrics
 Key: HBASE-20943
 URL: https://issues.apache.org/jira/browse/HBASE-20943
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 1.2.6.1, 2.0.0
Reporter: Tianying Chang


We intensively use metrics to monitor the health of our HBase production 
cluster. We have seen some regions of a table stuck and cannot be brought 
online due to AWS issue which cause some log file corrupted. It will be good if 
we can catch this early. Although WebUI has this information, it is not useful 
for automated monitoring. By adding this metric, we can easily monitor them 
with our monitoring system. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-17453) add Ping into HBase server for deprecated GetProtocolVersion

2017-01-11 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-17453:
--

 Summary: add Ping into HBase server for deprecated 
GetProtocolVersion
 Key: HBASE-17453
 URL: https://issues.apache.org/jira/browse/HBASE-17453
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 1.2.2
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor


Our HBase service is hosted in AWS. We saw cases where the connection between 
the client (Asynchbase in our case) and server stop working but did not throw 
any exception, therefore traffic stuck. So we added a "Ping" feature in 
AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side, 
if no traffic for given time, we send the "Ping", if no response back for 
"Ping", we assume the connect is bad and reconnect. 

Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is 
deprecated. To be able to support same detect/reconnect feature, we added 
Ping() in our internal HBase 1.2 branch, and also patched accordingly in 
Asynchbase 1.7.

We would like to open source this feature since it is useful for use case in 
AWS environment. 


We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS 
since in AWS, sometimes it enters a state the connection 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16128) add support for p999 histogram metrics

2016-06-27 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16128:
--

 Summary: add support for p999 histogram metrics
 Key: HBASE-16128
 URL: https://issues.apache.org/jira/browse/HBASE-16128
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor


Currently there is support for p75,p90,p99, but not support for p999. We need 
p999 metrics for reflecting p99 metrics at client level, especially client side 
is fanout call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16029.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16029
> URL: https://issues.apache.org/jira/browse/HBASE-16029
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16028.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16028
> URL: https://issues.apache.org/jira/browse/HBASE-16028
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-16027.

Resolution: Duplicate

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --
>
> Key: HBASE-16027
> URL: https://issues.apache.org/jira/browse/HBASE-16027
> Project: HBase
>  Issue Type: Bug
>  Components: hbase, Performance
>Affects Versions: 1.2.1
>Reporter: Tianying Chang
>Assignee: Tianying Chang
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16030) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16030:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16030
 URL: https://issues.apache.org/jira/browse/HBASE-16030
 Project: HBase
  Issue Type: Improvement
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16029) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16029:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16029
 URL: https://issues.apache.org/jira/browse/HBASE-16029
 Project: HBase
  Issue Type: Improvement
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16028) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16028:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16028
 URL: https://issues.apache.org/jira/browse/HBASE-16028
 Project: HBase
  Issue Type: Improvement
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16027) All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is on, causing flush spike

2016-06-14 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-16027:
--

 Summary: All Regions are flushed at about same time when 
MEMSTORE_PERIODIC_FLUSH is on, causing flush spike
 Key: HBASE-16027
 URL: https://issues.apache.org/jira/browse/HBASE-16027
 Project: HBase
  Issue Type: Bug
  Components: hbase, Performance
Affects Versions: 1.2.1
Reporter: Tianying Chang
Assignee: Tianying Chang


In our production cluster, we observed that memstore flush spike every hour for 
all regions/RS. (we use the default memstore periodic flush time of 1 hour). 

This will happend when two conditions are met: 
1. the memstore does not have enough data to be flushed before 1 hour limit 
reached;
2. all regions are opened around the same time, (e.g. all RS are started at the 
same time when start a cluster). 

With above two conditions, all the regions will be flushed around the same time 
at: startTime+1hour-delay again and again.

We added a flush jittering time to randomize the flush time of each region, so 
that they don't get flushed at around the same time. We had this feature 
running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found this 
issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while

2016-01-21 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-15155:
--

 Summary: Show All RPC handler tasks stop working after cluster is 
under heavy load for a while
 Key: HBASE-15155
 URL: https://issues.apache.org/jira/browse/HBASE-15155
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.94.19, 1.0.0, 0.98.0
Reporter: Tianying Chang
Assignee: Tianying Chang


After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
handler status" link on RS webUI stops working after running in production 
cluster with relatively high load for several days.  

Turn out to be it is a bug introduced by 
https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
RPCHandler Status overriden/removed permanently when there is a spike of 
non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
experienced "high" load once, the RPC status monitoring is gone forever, until 
RS is restarted. 

 We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-11765) ReplicationSink should merge the Put/Delete of the same row into one Action even if they are from different hlog entry.

2014-08-15 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-11765:
--

 Summary: ReplicationSink should merge the Put/Delete of the same 
row into one Action even if they are from different hlog entry.
 Key: HBASE-11765
 URL: https://issues.apache.org/jira/browse/HBASE-11765
 Project: HBase
  Issue Type: Improvement
  Components: Performance, Replication
Affects Versions: 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.7


The current replicationSink code make sure it will only create one Put/Delete 
action of the kv of same row if it is from same hlog entry. However, when the 
same row of Put/Delete exist in different hlog entry, multiple Put/Delete 
action will be created, this will cause synchronization cost during the multi 
batch operation. 

In one of our application traffic pattern which has delete for same row twice 
for many rows, we saw doMiniBatchMutation() is invoked many times due to the 
row lock for the same row. ReplicationSink side is super slow, and replication 
queue build up. 

We should put the put/delete for the same row into one Put/Delete action even 
if they are from different hlog entry. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11684) HBase replicationSource should support multithread to ship the log entry

2014-08-05 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-11684:
--

 Summary: HBase replicationSource should support multithread to 
ship the log entry
 Key: HBASE-11684
 URL: https://issues.apache.org/jira/browse/HBASE-11684
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver, Replication
Reporter: Tianying Chang
Assignee: Tianying Chang


We found the replication rate cannot keep up with the write rate when the 
master cluster is write heavy. We got huge log queue build up due to that. But 
when we do a rolling restart of master cluster, we found that the 
appliedOpsRate doubled due to the extra thread created to help recover the log 
of the restarted RS. ReplicateLogEntries is a synchronous blocking call, it 
becomes the bottleneck when is only runs with one thread. I think we should 
support multi-thread for the replication source to ship the data. I don't see 
any consistency problem. Any other concern here? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10935) support snapshot policy where flush memstore can be skipped to prevent production cluster freeze

2014-04-08 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-10935:
--

 Summary: support snapshot policy where flush memstore can be 
skipped to prevent production cluster freeze
 Key: HBASE-10935
 URL: https://issues.apache.org/jira/browse/HBASE-10935
 Project: HBase
  Issue Type: New Feature
  Components: shell, snapshots
Affects Versions: 0.94.18, 0.94.7
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor
 Fix For: 0.94.19


We are using snapshot feature to do HBase disaster recovery. We will do 
snapshot in our production cluster periodically. The current flush snapshot 
policy require all regions of the table to coordinate to prevent write and do 
flush at the same time. Since we use WALPlayer to complete the data that is not 
in the snapshot HFile, we don't need the snapshot to do coordinated flush. The 
snapshot just recored all the HFile that are already there. 

I added the parameter in the HBase shell. So people can choose to use the 
NoFlush snapshot when they need, like below. Otherwise, the default flush 
snpahot support is not impacted. 

snaphot 'TestTable', 'TestSnapshot', 'skipFlush'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-8836) Separate reader and writer thread pool in RegionServer, so that write throughput will not be impacted when the read load is very high

2013-06-28 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-8836:
-

 Summary: Separate reader and writer thread pool in RegionServer, 
so that write throughput will not be impacted when the read load is very high
 Key: HBASE-8836
 URL: https://issues.apache.org/jira/browse/HBASE-8836
 Project: HBase
  Issue Type: New Feature
  Components: Performance, regionserver
Affects Versions: 0.94.8
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.8


We found that when the read load on a specific RS is high, the write throughput 
also get impacted dramatically, and even cause write data loss sometimes. We 
want to prioritize the write by putting them in a separate queue from the read 
request, so that slower read will not make fast write wait nu-necessarily long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-7882) move region level metrics readReqeustCount and writeRequestCount to Metric 2

2013-04-02 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-7882.
---

  Resolution: Duplicate
Release Note: the metrics already in metric 2.  There is another jira 
related for putting this 2 metrics in 94, which has been commited. This jira 
can be closed now. 

 move region level metrics readReqeustCount and writeRequestCount to Metric 2 
 -

 Key: HBASE-7882
 URL: https://issues.apache.org/jira/browse/HBASE-7882
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.96.0
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor

 HBASE-7818 is for 94. Following the refactor of HBASE-6410, I need to 
 refactor the 94 patch of HBASE-7818 against metric 2. Patch for 96 will be 
 very different from 94. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8044) split/flush/compact/major_compact from hbase shell does not work for region key with \x format

2013-03-15 Thread Tianying Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianying Chang resolved HBASE-8044.
---

Resolution: Duplicate

this bug has been fixed by HBASE-6643 in 0.94. It changed the shell input for 
split/flush/compact to take encoded region name, instead of the full region 
name. This avoided the confusion for the format conversion problem with full 
region name.  This fix is not needed anymore. 

 split/flush/compact/major_compact from hbase shell does not work for region 
 key with \x format
 --

 Key: HBASE-8044
 URL: https://issues.apache.org/jira/browse/HBASE-8044
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.94.5
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8044.patch, 8044-trunk.txt, 8044-trunk-v2.txt, 
 8044-v2.patch


 the conversion between bytes and string is incorrect

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8085) Backport the fix for Bytes.toStringBinary() into 94

2013-03-12 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-8085:
-

 Summary: Backport the fix for Bytes.toStringBinary() into 94
 Key: HBASE-8085
 URL: https://issues.apache.org/jira/browse/HBASE-8085
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.94.5
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.7


there is a bug in Bytes.toStringBinary(), which will return the same string for 
1) byte[] a = {'\\', 'x', 'D', 'A'} 2) \xDA. 

It seems this bug has already been fixed in trunk with HBASE 6991. We should 
backport it to 94. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8044) split/flush/compact/major_compact from hbase shell does not work for region key has \x format,

2013-03-08 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-8044:
-

 Summary: split/flush/compact/major_compact from hbase shell does 
not work for region key has \x format,
 Key: HBASE-8044
 URL: https://issues.apache.org/jira/browse/HBASE-8044
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.94.5
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.6




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7896) make rename_table working in 92/94

2013-02-21 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-7896:
-

 Summary: make rename_table working in 92/94
 Key: HBASE-7896
 URL: https://issues.apache.org/jira/browse/HBASE-7896
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.5, 0.92.2
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.5, 0.92.2


The rename_table function is very useful for our customers. However, 
rename_table.rb does not work for 92/94. It has several bugs. It will be useful 
to fix them so that users can solve their problems. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7882) move region level metrics readReqeustCount and writeRequestCount to Metric 2

2013-02-19 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-7882:
-

 Summary: move region level metrics readReqeustCount and 
writeRequestCount to Metric 2 
 Key: HBASE-7882
 URL: https://issues.apache.org/jira/browse/HBASE-7882
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.96.0
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor
 Fix For: 0.96.0


HBASE-7818 is for 94. Following the refactor of HBASE-6410, I need to refactor 
the 94 patch of HBASE-7818 against metric 2. Patch for 96 will be very 
different from 94. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7816) numericPersistentMetrics should be not be cleared for regions that are not being closed.

2013-02-11 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-7816:
-

 Summary: numericPersistentMetrics should be not be cleared for 
regions that are not being closed. 
 Key: HBASE-7816
 URL: https://issues.apache.org/jira/browse/HBASE-7816
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.4
Reporter: Tianying Chang
Assignee: Tianying Chang
 Fix For: 0.94.4


when a region is closed, the region level dynamic metrics 
numericPersistentMetrics are cleared for all regions on the same region 
servers. It is OK for numericMetrics and timeVaryingMetrics. But not right for 
numericPersistentMetrics, because those value are accumulated and not be reset 
at poll time. To keep the right value, only the metrics for the closed region 
should be cleared, numericPersistentMetrics for other regions should be kept. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7818) add region level metrics readReqeustCount and writeRequestCount

2013-02-11 Thread Tianying Chang (JIRA)
Tianying Chang created HBASE-7818:
-

 Summary: add region level metrics readReqeustCount and 
writeRequestCount 
 Key: HBASE-7818
 URL: https://issues.apache.org/jira/browse/HBASE-7818
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.94.4
Reporter: Tianying Chang
Assignee: Tianying Chang
Priority: Minor
 Fix For: 0.94.6


Request rate at region server level can help identify the hot region server. 
But it will be good if we can further identify the hot regions on that region 
server. That way, we can easily find out unbalanced regions problem. 

Currently, readRequestCount and writeReqeustCount per region is exposed at 
webUI. It will be more useful to expose it through hadoop metrics framework 
and/or JMX, so that people can see the history when the region is hot.

I am exposing the existing readRequestCount/writeRequestCount into the dynamic 
region level metrics framework. I am not changing/exposing it as rate because 
our openTSDB is taking the raw data of read/write count, and apply rate 
function to display the rate already. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira