[jira] [Commented] (HBASE-8789) Add max RPC version to meta-region-server zk node.

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693688#comment-13693688
 ] 

Elliott Clark commented on HBASE-8789:
--

It's still 0.  I think the 2 you're seeing is the field index in protobuf's.

I fill out the rpcVersion with RPC_CURRENT_VERSION which seems to still be 0.

 Add max RPC version to meta-region-server zk node.
 --

 Key: HBASE-8789
 URL: https://issues.apache.org/jira/browse/HBASE-8789
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC, Zookeeper
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch


 For clients to boot strap themselves they need to know the max rpc version 
 that the meta server will accept.  We should add that to the zookeeper node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693798#comment-13693798
 ] 

Varun Sharma commented on HBASE-8370:
-

Here are some stats for this JIRA - I am arguing that the BlockCacheHit ratio 
number reported on a region server does not mean much.

tbl.feeds.cf.home.bt.Index.fsBlockReadCnt : 46864,
tbl.feeds.cf.home.bt.Index.fsBlockReadCacheHitCnt : 46864

Index Block cache hit ratio = 100 %

tbl.feeds.cf.home.bt.Data.fsBlockReadCacheHitCnt : 202
tbl.feeds.cf.home.bt.Data.fsBlockReadCnt : 247

Data Block cache hit ratio = 82 %

Overall Cache hit ration = (46864 + 202) / (46864 + 247) = 99 %

Since Indexes are hit often, cache hits are 100 % and also # of hits is high. 
The real number that we are concerned about, is 82 % which is hit rate on the 
data block. However, we continue to show the # 99 % on the region server 
console instead. I think we need to fix that number. Please let me know if 
folks object to this ?

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow

2013-06-26 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693838#comment-13693838
 ] 

Chao Shi commented on HBASE-8802:
-

It seems like TestAccessController failure has nothing to do with this patch. 
It still fails when I ran it on my box without this patch.

 totalCompactingKVs overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8496) Implement tags and the internals of how a tag should look like

2013-06-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8496:
--

Attachment: Tag design.pdf

Attaching a simple design document that says how tags will be supported by 
HBase and the advantage of using KeyValuecodec. 
It also touches on the way how tags can be implemented in an optional way when 
we don't go with KeyValuecodec.  Pls feel free to share your comments/reviews.  
Thanks to Andy and Anoop for their reviews/suggestions.

 Implement tags and the internals of how a tag should look like
 --

 Key: HBASE-8496
 URL: https://issues.apache.org/jira/browse/HBASE-8496
 Project: HBase
  Issue Type: New Feature
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: Tag design.pdf


 The intent of this JIRA comes from HBASE-7897.
 This would help us to decide on the structure and format of how the tags 
 should look like. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693955#comment-13693955
 ] 

Jean-Marc Spaggiari commented on HBASE-8793:


Hi [~stack], Thanks for that! However, I tried to close 8803 or to modify one 
comment, or change the status, but I'm still not able. I can modify the few 
attributs, but not the resolution nor the status. Same in this defect, not able 
to modify those fields.

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8716) Fixups/Improvements for graceful_stop.sh/region_mover.rb

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693965#comment-13693965
 ] 

Jean-Marc Spaggiari commented on HBASE-8716:


I did some changes (//) which are making it 5 times faster. Might be even 
faster on big clusters. I will push the patch today for comments.

 Fixups/Improvements for graceful_stop.sh/region_mover.rb
 

 Key: HBASE-8716
 URL: https://issues.apache.org/jira/browse/HBASE-8716
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8716.txt


 It is a while since these scripts were touched.  Giving them a spring 
 cleaning and seeing if can make them return error codes on failure (seems 
 like style previous was that the operator would watch the output and react to 
 it but I see cases where tools want to call these scripts and they want 
 return code to indicate whether the rolling upgrade worked or not).  Also, 
 see if can make the rolling restart faster since one-by-one while minimally 
 disruptive and 'safe', it is slow one clusters of hundreds of nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694006#comment-13694006
 ] 

stack commented on HBASE-8793:
--

Yet...

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694005#comment-13694005
 ] 

stack commented on HBASE-8793:
--

[~jmspaggi] Try again (smile); I added you to the 'committer's list though you 
ain't..

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694007#comment-13694007
 ] 

stack commented on HBASE-8370:
--

[~eclark] Mighty Elliott... input?

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8789) Add max RPC version to meta-region-server zk node.

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694008#comment-13694008
 ] 

stack commented on HBASE-8789:
--

Duh.  Yes.  +1 on patch.

 Add max RPC version to meta-region-server zk node.
 --

 Key: HBASE-8789
 URL: https://issues.apache.org/jira/browse/HBASE-8789
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC, Zookeeper
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch


 For clients to boot strap themselves they need to know the max rpc version 
 that the meta server will accept.  We should add that to the zookeeper node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Affects Version/s: 0.98.0
   0.94.8
   0.95.1

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694015#comment-13694015
 ] 

Jean-Marc Spaggiari commented on HBASE-8793:


;) Thanks for the yet ;) I just retried and sent you an email about the 
result.

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore

2013-06-26 Thread Amitanand Aiyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694028#comment-13694028
 ] 

Amitanand Aiyer commented on HBASE-8228:


Looks like this is caused when we are using multiple memstore-flusher threads; 
and two flush requests have a log-roll between them.

to flush a region, we grab the HRegion.updatesLock.writeLock, and then try to 
grab the HLog.cacheFlushLock.readLock(). Most of the operation that happens 
within the lock, is done in memory, so this should have been a short duration. 
... unless, we are waiting to grab the lock.

HLog.rollWriter tries to grab the HLog.cacheFlushLock.writeLock(). This means 
that a Log-roll cannot happen when a flush is already in progress.

If a second flush were to be initiated, when there is already a flush going on, 
and there is a log-roll, waiting (for a writer's lock); then
the second flush, is able to get the HRegion.updatesLock.writeLock (presumably, 
for a different region). But, will stall on the HLog.cacheFlushLock.readLock(). 
This is because the ReaderWriterLock implementation, which uses the 
NonFairSync() will cause the reader locks to wait on the writer's request; if 
the writer is at the head of the queue.

This interleaving results in the second flush request, holding the 
HRegion.updatesLock.writeLock() for as long as the first thread took to flush a 
region + do a log roll.

Swapping the order of the HRegion.updatesLock.writeLock(), and startCacheFlush 
should probably fix this issue. Reducing the # of memstore flusher threads to 
= 1 can also stop this behavior.

 Investigate time taken to snapshot memstore
 ---

 Key: HBASE-8228
 URL: https://issues.apache.org/jira/browse/HBASE-8228
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 Snapshotting memstores is normally quick. But, sometimes it seems to take 
 long. This JIRA is to track the investigation and fix to improve the outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Attachment: HBASE-8803-v0-trunk.patch

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694032#comment-13694032
 ] 

Andrew Purtell commented on HBASE-8370:
---

bq.  I think we need to fix that number. 

+1

Configuration settings can change how we handle the different block types on a 
per type basis.That's only half the story if we do not have per-block-type 
metrics too.

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Status: Patch Available  (was: Open)

So here is what I did.

First, when regions are unloaded from the server to be moved to other servers. 
Instead of doing that region by region and randomly, it's now doing that in 
round dobin mode, assigning one region per RS. So if there is 20 RS in the 
cluster, one beeing unloaded, it will move the regions 19 by 19!

Then to restore the regions, instead of doing that one by one, it's now going 
that 10 by 10.

As a result, the rolling-restart now takes 16 minutes in my cluster instead of 
74 minutes. And the bigger the cluster is, the faster it will be.

This version is for review only. Open to comments. I have tested it on 0.94, 
but I don't have a cluster running with Trunk, so I'm not able to test it...

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1, 0.94.8, 0.98.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-8370:
--


 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-8370.
--

Resolution: Fixed

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari reopened HBASE-8803:



 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8803:
-

Component/s: Usability

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694044#comment-13694044
 ] 

stack commented on HBASE-8803:
--

Can you make this new behavior optional?  Or at least how many to do at a time 
a parameter (max concurrent threads moving and restoring)?

Folks want the fast rolling restart for sure -- I know for a fact that a few of 
your customers need it (smile) -- but the good thing about the old behavior is 
that it was minimally disruptive (though slow) so is good for a cluster that is 
doing hard serving.

Thanks for working on this important operational issue [~jmspaggi]

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore

2013-06-26 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694064#comment-13694064
 ] 

Himanshu Vashishtha commented on HBASE-8228:


bq. Swapping the order of the HRegion.updatesLock.writeLock(), and 
startCacheFlush should probably fix this issue.
So, you don't lock the region until you get the cacheFlushLock.readLock(). 

In 0.94, cacheFlushLock is still a ReentrantLock. I wonder whether multiple 
memstore flush threads help there at all (if multiple flushers are there in 
0.94).
In trunk, we no longer write flush events to hlog. Basically, a flush can 
happen while log rolling is going on.


 Investigate time taken to snapshot memstore
 ---

 Key: HBASE-8228
 URL: https://issues.apache.org/jira/browse/HBASE-8228
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 Snapshotting memstores is normally quick. But, sometimes it seems to take 
 long. This JIRA is to track the investigation and fix to improve the outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694070#comment-13694070
 ] 

Hadoop QA commented on HBASE-8803:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12589763/HBASE-8803-v0-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//console

This message is automatically generated.

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver

2013-06-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694087#comment-13694087
 ] 

ramkrishna.s.vasudevan commented on HBASE-8790:
---

+1 on patch.

 NullPointerException thrown when stopping regionserver
 --

 Key: HBASE-8790
 URL: https://issues.apache.org/jira/browse/HBASE-8790
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.1
 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3
Reporter: Xiong LIU
Assignee: Liang Xie
 Attachments: HBase-8790.txt


 The Hbase cluster is a fresh start with one regionserver.
 When we stop hbase, an unhandled NullPointerException is throwed in the 
 regionserver.
 The regionserver's log is as follows:
 2013-06-21 10:21:11,284 INFO  [regionserver61020] regionserver.HRegionServer: 
 Closing user regions
 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: 
 Waiting on 1028785192
 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: 
 ABORTING region server HOSTNAME_TEST,61020,1371781086817
 : Unhandled: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: 
 RegionServer abort: loaded coprocessors are: [org.apache
 .hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-06-21 10:21:14,293 INFO  [regionserver61020] regionserver.HRegionServer: 
 STOPPED: Unhandled: null
 2013-06-21 10:21:14,293 INFO  [regionserver61020] ipc.RpcServer: Stopping 
 server on 61020
 It seems that after closing user regions, the rssStub is null.
 update:
 we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or 
 other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) 
 in hbase-site.xml, the regionserver is continuously attempting connect to 
 master. and if we stop hbase, the above NullPointerException occurred. With 
 hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8737:
-

Attachment: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch

Here is a patch that does both sides.  Trying against hadoopqa.

 [replication] Change replication RPC to use cell blocks
 ---

 Key: HBASE-8737
 URL: https://issues.apache.org/jira/browse/HBASE-8737
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
 Fix For: 0.95.2

 Attachments: 
 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt


 Currently, the replication rpc that ships edits simply dumps the byte value 
 of WAL edit key/value pairs into a protobuf message.
 Modify the replication rpc mechanism to use cell blocks so it can leverage 
 encoding and compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8737:
-

Fix Version/s: 0.98.0
   Status: Patch Available  (was: Open)

 [replication] Change replication RPC to use cell blocks
 ---

 Key: HBASE-8737
 URL: https://issues.apache.org/jira/browse/HBASE-8737
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
 Fix For: 0.98.0, 0.95.2

 Attachments: 
 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt


 Currently, the replication rpc that ships edits simply dumps the byte value 
 of WAL edit key/value pairs into a protobuf message.
 Modify the replication rpc mechanism to use cell blocks so it can leverage 
 encoding and compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8737) [replication] Change replication RPC to use cell blocks

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694118#comment-13694118
 ] 

Hadoop QA commented on HBASE-8737:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12589774/0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestCheckTestClasses

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//console

This message is automatically generated.

 [replication] Change replication RPC to use cell blocks
 ---

 Key: HBASE-8737
 URL: https://issues.apache.org/jira/browse/HBASE-8737
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
 Fix For: 0.98.0, 0.95.2

 Attachments: 
 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt


 Currently, the replication rpc that ships edits simply dumps the byte value 
 of WAL edit key/value pairs into a protobuf message.
 Modify the replication rpc mechanism to use cell blocks so it can leverage 
 encoding and compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.

2013-06-26 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694130#comment-13694130
 ] 

Dave Latham commented on HBASE-8806:


A little more background on how this came up.  We're currently replicating 
writes in both directions between two large clusters.  Occasionally we would 
see one node's replication queue start falling behind, and once it got behind 
it appeared to go slower than it did while it was caught up!  It would get into 
a cycle of replicating a batch of 25000 edits with each batch taking something 
like 3 minutes.  Examining threads on the node receiving the writes would show 
the handler thread in stacks like
{noformat}
IPC Server handler 68 on 60020 daemon prio=10 tid=0x2aaac0d14800 
nid=0x3548 runnable [0x4
   java.lang.Thread.State: RUNNABLE
at java.util.ArrayList.init(ArrayList.java:112)
at 
com.google.common.collect.Lists.newArrayListWithCapacity(Lists.java:168)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2129)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2059)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3571)
at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
{noformat}

The 25000 edits were being sorted by row, with many rows editing up having 
multiple puts in a batch.  Each time HRegion.doMiniBatchMutation encounters 
multiple puts to the same row it would fail to acquire the lock on that row for 
the second put, slowing it down.

This patch makes it able to handle the full batch in one go.

 Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
 duplicate rows.
 

 Key: HBASE-8806
 URL: https://issues.apache.org/jira/browse/HBASE-8806
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: rahul gidwani
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch


 If we already have the lock in the doMiniBatchMutation we don't need to 
 re-acquire it. The solution would be to keep a cache of the rowKeys already 
 locked for a miniBatchMutation and If we already have the 
 rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
 this problem would be to keep a set of rows we already locked and not try to 
 acquire the lock for these rows.  
 We have tested this fix in our production environment and has improved 
 replication performance quite a bit.  We saw a replication batch go from 3+ 
 minutes to less than 10 seconds for batches with duplicate row keys.
 {code}
 static final int ACQUIRE_LOCK_COUNT = 0;
   @Test
   public void testRedundantRowKeys() throws Exception {
 final int batchSize = 10;
 
 String tableName = getClass().getSimpleName();
 Configuration conf = HBaseConfiguration.create();
 conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
 MockHRegion region = (MockHRegion) 
 TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
 Bytes.toBytes(a));
 ListPairMutation, Integer someBatch = Lists.newArrayList();
 int i = 0;
 while (i  batchSize) {
   if (i % 2 == 0) {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), 
 null));
   } else {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), 
 null));
   }
   i++;
 }
 long startTime = System.currentTimeMillis();
 region.batchMutate(someBatch.toArray(new Pair[0]));
 long endTime = System.currentTimeMillis();
 long duration = endTime - startTime;
 System.out.println(duration:  + duration +  ms);
 assertEquals(2, ACQUIRE_LOCK_COUNT);
   }
   @Override
   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
 throws IOException {
 ACQUIRE_LOCK_COUNT++;
 return super.getLock(lockid, row, waitForLock);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7411:
-

Attachment: 7411v3.txt

Removed TestRecoverableZooKeeper; expects to be able to replace zk which is not 
possible when curator is doing zk.

Fixed place where getZk could come back null (there may be others).

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 or 3 whichever will be less 
 painful. 
 Related issues: 
 HBASE-5547
 HBASE-7305
 HBASE-7212

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694143#comment-13694143
 ] 

Hadoop QA commented on HBASE-7411:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589780/7411v3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6148//console

This message is automatically generated.

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 or 3 whichever will be less 
 painful. 
 Related issues: 
 HBASE-5547
 HBASE-7305
 HBASE-7212

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8800) Return non-zero exit codes when a region server aborts

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8800:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving as committed.  Probably too radical a change for 0.94 but will let 
[~lhofhansl] make call.

 Return non-zero exit codes when a region server aborts
 --

 Key: HBASE-8800
 URL: https://issues.apache.org/jira/browse/HBASE-8800
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.95.2

 Attachments: HBASE-8800.patch


 There's a few exit code-related jiras flying around, but it seems that at 
 least for the region server we have a bigger problem: it always returns 0 
 when exiting once it's started.
 I also saw that we have a couple -1 as exit codes, AFAIK this should be 1 (or 
 at least a positive number).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.

2013-06-26 Thread rahul gidwani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694147#comment-13694147
 ] 

rahul gidwani commented on HBASE-8806:
--

I will provide a patch for trunk, no problem.  I should have it by tomorrow

 Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
 duplicate rows.
 

 Key: HBASE-8806
 URL: https://issues.apache.org/jira/browse/HBASE-8806
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: rahul gidwani
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch


 If we already have the lock in the doMiniBatchMutation we don't need to 
 re-acquire it. The solution would be to keep a cache of the rowKeys already 
 locked for a miniBatchMutation and If we already have the 
 rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
 this problem would be to keep a set of rows we already locked and not try to 
 acquire the lock for these rows.  
 We have tested this fix in our production environment and has improved 
 replication performance quite a bit.  We saw a replication batch go from 3+ 
 minutes to less than 10 seconds for batches with duplicate row keys.
 {code}
 static final int ACQUIRE_LOCK_COUNT = 0;
   @Test
   public void testRedundantRowKeys() throws Exception {
 final int batchSize = 10;
 
 String tableName = getClass().getSimpleName();
 Configuration conf = HBaseConfiguration.create();
 conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
 MockHRegion region = (MockHRegion) 
 TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
 Bytes.toBytes(a));
 ListPairMutation, Integer someBatch = Lists.newArrayList();
 int i = 0;
 while (i  batchSize) {
   if (i % 2 == 0) {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), 
 null));
   } else {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), 
 null));
   }
   i++;
 }
 long startTime = System.currentTimeMillis();
 region.batchMutate(someBatch.toArray(new Pair[0]));
 long endTime = System.currentTimeMillis();
 long duration = endTime - startTime;
 System.out.println(duration:  + duration +  ms);
 assertEquals(2, ACQUIRE_LOCK_COUNT);
   }
   @Override
   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
 throws IOException {
 ACQUIRE_LOCK_COUNT++;
 return super.getLock(lockid, row, waitForLock);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694151#comment-13694151
 ] 

Nicolas Liochon commented on HBASE-6295:


For the logs, it's a bug, easy to fix. I will do it.
For the failure itself, the integration test uses a retry count of 10. This is 
not enough. If I increase to 30 it succeeds 5 times out of 5, while I've got a 
60% failure rate with a value of 10. The integration tests runs with the value 
found in hbase-server/.../test/resources, and this value was not changed by the 
various jira we had about this default value.

I will run more tests during the night, but this seems to be it.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7411:
-

Attachment: 7411v4.txt

Rebase

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 
 hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 or 3 whichever will be less 
 painful. 
 Related issues: 
 HBASE-5547
 HBASE-7305
 HBASE-7212

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter

2013-06-26 Thread Vasu Mariyala (JIRA)
Vasu Mariyala created HBASE-8809:


 Summary: Include deletes in the scan (setRaw) method does not 
respect the time range or the filter
 Key: HBASE-8809
 URL: https://issues.apache.org/jira/browse/HBASE-8809
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: Vasu Mariyala


If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) 
is executed, it still returns the delete marker at time stamp 'T'. It is 
because of the code in ScanQueryMatcher.java

  if (retainDeletesInOutput
  || (!isUserScan  (EnvironmentEdgeManager.currentTimeMillis() - 
timestamp) = timeToPurgeDeletes)
  || kv.getMemstoreTS()  maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
return MatchCode.INCLUDE;
  }

The assumption is scan (even with setRaw is set to true) should respect the 
filters and the time range specified.

Please let me know if you think this behavior can be changed so that I can 
provide a patch for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694161#comment-13694161
 ] 

stack commented on HBASE-8732:
--

How you think this happened [~eclark]? I created a table w/ fast diff and am 
able to scan it.  I then altered the table to disable FAST_DIFF and can still 
scan (even after writing).  You see this on your rig?

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694163#comment-13694163
 ] 

stack commented on HBASE-6295:
--

+1 on committing bug fix and upping retry count as addendum on this issue.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694169#comment-13694169
 ] 

Elliott Clark commented on HBASE-8732:
--

Not really sure. I see it locally actually.  When running the IT test from 
HBASE-8726 in maven or an ide.

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694178#comment-13694178
 ] 

Elliott Clark commented on HBASE-6295:
--

So I see this issue on a real cluster where the local conf is added to the 
classpath ahead of any jars.  How would the test settings be causing this ?

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694181#comment-13694181
 ] 

stack commented on HBASE-8732:
--

[~eclark] Let me try...

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694180#comment-13694180
 ] 

Jean-Marc Spaggiari commented on HBASE-8803:


Sure!

I will add a parameter like maxthreads with a default value to 1 so with no 
parameters that will act as before, but with this parameters we will be able to 
speedup the process.

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)

2013-06-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HBASE-8776:
-


 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694186#comment-13694186
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

I think it might have been caused by retry tweaking (the thing we discussed 
tomorrow about the pause length). The pause is reduced to 100ms on trunk, while 
being 1000ms on 94, so current trunk retries are too short.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)

2013-06-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694189#comment-13694189
 ] 

Sergey Shelukhin commented on HBASE-8776:
-

hmm, the trunk pause is apparently reduced to 100ms, so these retry settings 
are only correct for 94. To get the same length on trunk, we'd need to put 128 
back and dial the retries up to ~35

 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)

2013-06-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694191#comment-13694191
 ] 

Sergey Shelukhin commented on HBASE-8776:
-

or I can just revert the change from trunk and 95. Any objections?

 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694194#comment-13694194
 ] 

Elliott Clark commented on HBASE-6295:
--

This started failing before the retry tweaks went in.
And you were correct yesterday trunk is still at 1000ms as the default pause 
time.  ( 
https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java#L554
 )

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Attachment: HBASE-8803-v1-trunk.patch

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Status: Patch Available  (was: Reopened)

Here we go.

Updated version:
- Added maxthreads parameter to region_mover.rb and graceful_stop.sh.
- If maxthreads is  RS then go random, else go roundrobin.

Tested on 0.94, don't have a way to test it on trunk :(

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.95.1, 0.94.8, 0.98.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205
 ] 

Elliott Clark commented on HBASE-6295:
--

oh blah never mind it's just that the fall back default wasn't changed but the 
xml was.  It really is 100ms base pause time.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205
 ] 

Elliott Clark edited comment on HBASE-6295 at 6/26/13 8:01 PM:
---

oh blah never mind it's just that the fall back default wasn't changed but the 
xml was.  It really is 100ms base pause time.  I'll file a jira to make them 
all the same to stop confusion in the future.

  was (Author: eclark):
oh blah never mind it's just that the fall back default wasn't changed but 
the xml was.  It really is 100ms base pause time.
  
 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694216#comment-13694216
 ] 

Hadoop QA commented on HBASE-7411:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589781/7411v4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:475)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//console

This message is automatically generated.

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 
 hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 or 3 whichever will be less 
 painful. 
 Related issues: 
 

[jira] [Created] (HBASE-8810) Bring in code constants in line with default xml's

2013-06-26 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-8810:


 Summary: Bring in code constants in line with default xml's
 Key: HBASE-8810
 URL: https://issues.apache.org/jira/browse/HBASE-8810
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark


After the defaults were changed in the xml some constants were left the same.

DEFAULT_HBASE_CLIENT_PAUSE for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8774) Add BatchSize and Filter to Thrift2

2013-06-26 Thread Hamed Madani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694229#comment-13694229
 ] 

Hamed Madani commented on HBASE-8774:
-

Yes Lars. The reason I followed HBASE-4176 was , it made it easier for us to 
move from thrift1 to thrift2. Also it has been tested before with folks running 
thrift1. Moreover, from what I understand from HBASE-6073 it seems 

{code}
 * Specify boolean operator for TFilterList:
 *  - MUST_PASS_ALL means AND boolean operation
 *  - MUST_PASS_ONE means OR boolean operation
 */
enum TFilterListOperator {
  MUST_PASS_ALL = 0,
  MUST_PASS_ONE = 1
}

/**
* Represents a server side filter list
*
*/
struct TFilterList {
+  1: required TFilterListOperator operator,
  2: required listTFilter filters
}
{code}

limits you to either *AND* all filters or *OR* all of them. Whereas with 
HBASE-4176 you can have something like “(Filter1 AND Filter2) OR Filter3
There is also HBASE-6073 close the scanner when the scanner doesn't have any 
more results to return. I can certainly merge the two to include this feature. 
Let me know what you think.

 Add BatchSize and Filter to Thrift2
 ---

 Key: HBASE-8774
 URL: https://issues.apache.org/jira/browse/HBASE-8774
 Project: HBase
  Issue Type: New Feature
  Components: Thrift
Affects Versions: 0.95.1
Reporter: Hamed Madani
Assignee: Hamed Madani
 Attachments: HBASE_8774.patch


 Attached Patch will add BatchSize and Filter support to Thrift2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694244#comment-13694244
 ] 

Hadoop QA commented on HBASE-8803:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12589789/HBASE-8803-v1-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController
  org.apache.hadoop.hbase.TestIOFencing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6150//console

This message is automatically generated.

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694252#comment-13694252
 ] 

stack commented on HBASE-8776:
--

Do whatever you need [~sershe] to get to 5minutes or so in trunk.   Thanks.

 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7411:
-

Attachment: 7411v4.txt

Retry

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 
 7411v4.txt, hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 or 3 whichever will be less 
 painful. 
 Related issues: 
 HBASE-5547
 HBASE-7305
 HBASE-7212

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter

2013-06-26 Thread Jesse Yates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates updated HBASE-8809:
---

Description: 
If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) 
is executed, it still returns the delete marker at time stamp 'T'. It is 
because of the code in ScanQueryMatcher.java

{code}
  if (retainDeletesInOutput
  || (!isUserScan  (EnvironmentEdgeManager.currentTimeMillis() - 
timestamp) = timeToPurgeDeletes)
  || kv.getMemstoreTS()  maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
return MatchCode.INCLUDE;
  }
{code}
The assumption is scan (even with setRaw is set to true) should respect the 
filters and the time range specified.

Please let me know if you think this behavior can be changed so that I can 
provide a patch for it.

  was:
If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) 
is executed, it still returns the delete marker at time stamp 'T'. It is 
because of the code in ScanQueryMatcher.java

  if (retainDeletesInOutput
  || (!isUserScan  (EnvironmentEdgeManager.currentTimeMillis() - 
timestamp) = timeToPurgeDeletes)
  || kv.getMemstoreTS()  maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
return MatchCode.INCLUDE;
  }

The assumption is scan (even with setRaw is set to true) should respect the 
filters and the time range specified.

Please let me know if you think this behavior can be changed so that I can 
provide a patch for it.


 Include deletes in the scan (setRaw) method does not respect the time range 
 or the filter
 -

 Key: HBASE-8809
 URL: https://issues.apache.org/jira/browse/HBASE-8809
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: Vasu Mariyala

 If a row has been deleted at time stamp 'T' and a scan with time range (0, 
 T-1) is executed, it still returns the delete marker at time stamp 'T'. It is 
 because of the code in ScanQueryMatcher.java
 {code}
   if (retainDeletesInOutput
   || (!isUserScan  (EnvironmentEdgeManager.currentTimeMillis() - 
 timestamp) = timeToPurgeDeletes)
   || kv.getMemstoreTS()  maxReadPointToTrackVersions) {
 // always include or it is not time yet to check whether it is OK
 // to purge deltes or not
 return MatchCode.INCLUDE;
   }
 {code}
 The assumption is scan (even with setRaw is set to true) should respect the 
 filters and the time range specified.
 Please let me know if you think this behavior can be changed so that I can 
 provide a patch for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8089) Add type support

2013-06-26 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-8089:


Attachment: hbase data types WIP.pdf

Attaching my slides from the Hadoop Summit BoF talk per [~stack]'s suggestion.

 Add type support
 

 Key: HBASE-8089
 URL: https://issues.apache.org/jira/browse/HBASE-8089
 Project: HBase
  Issue Type: New Feature
  Components: Client
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.95.2

 Attachments: HBASE-8089-types.txt, HBASE-8089-types.txt, 
 HBASE-8089-types.txt, HBASE-8089-types.txt, hbase data types WIP.pdf


 This proposal outlines an improvement to HBase that provides for a set of 
 types, above and beyond the existing byte-bucket strategy. This is intended 
 to reduce user-level duplication of effort, provide better support for 
 3rd-party integration, and provide an overall improved experience for 
 developers using HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives

2013-06-26 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694262#comment-13694262
 ] 

Nick Dimiduk commented on HBASE-8693:
-

bq. Should this work be in hbase-common rather than in hbase-client?

Initial conversations required the type stuff not be in common. I agree, it 
makes more sense there and I think that community opinion is changing. The 
current implementation doesn't bring in any dependencies, so it should be 
painless.

bq. What is Order here?

{{Order}} is a component from the {{OrderedBytes}} implementation (see patch on 
HBASE-8201). It enables users to store data sorted in ascending or descending 
order. Right now it's mostly a vestigial appendage; I don't know how the data 
types API wants to expose and consume this functionality. I'm hoping to gain 
insight from Phoenix, Kiji, c in future reviews.

bq. When would I use isCoercibleTo?

This comes from examination of Phoenix's {{PDataType}}. My understanding is, in 
the absence of secondary indices, the query planner can use type coercion to 
its advantage. This is the part of the data type API that I understand the 
least. I'm hoping for more clarity from [~giacomotaylor].

bq. I see a read on Union4

Sounds like a bug to me.

bq. How I describe a Struct outside of a Struct..?

Examples to follow.

bq. Whats a Binary?

Equivalent to SQL BLOB. This is how a user can inject good old fashion 
{{byte[]}}s into a {{Struct}} or {{Union}}.

bq. Do we need all these types?

Great question. That conversation is happening up on HBASE-8089. My preference 
is no, but I think the SQL guys want more of these for better interoperability 
between them.

 Implement extensible type API based on serialization primitives
 ---

 Key: HBASE-8693
 URL: https://issues.apache.org/jira/browse/HBASE-8693
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.95.2

 Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8799:
-

Attachment: 8799.txt

Add some debugging -- include toString of exception we are getting.

 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8799:
-

Fix Version/s: 0.95.2
 Assignee: stack

 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694270#comment-13694270
 ] 

stack commented on HBASE-8799:
--

Committed 8799.txt

 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8799:
-

Status: Patch Available  (was: Open)

Submitting patch to hadoopqa to see if it gets me a message too.

 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694278#comment-13694278
 ] 

Hadoop QA commented on HBASE-8799:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589799/8799.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6152//console

This message is automatically generated.

 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694296#comment-13694296
 ] 

Varun Sharma commented on HBASE-8370:
-

So, it seems that we have per block type metrics from SchemaMetrics under the 
region server and they are exposed as /jmx.

The question is, which metric should we report on the region server UI. Right 
now all our clusters 99 % cache hit ratio which is false, since 20 % percent of 
the time there is a DataBlock miss and we are hitting disk for 20 % of requests.

I have been misled by this number in the past, and I think there could be 
others, who are being similarly misled. So, should we just report another more 
representative metric on the region server console.

Varun

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow

2013-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694313#comment-13694313
 ] 

Hudson commented on HBASE-8802:
---

Integrated in hbase-0.95 #270 (See 
[https://builds.apache.org/job/hbase-0.95/270/])
HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497051)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java


 totalCompactingKVs overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694320#comment-13694320
 ] 

Elliott Clark commented on HBASE-8370:
--

We don't have per block type metrics in trunk/95 because the overall cache hit 
percentage is a good proxy for data block cache percent.  Yes the overall 
number is higher but it still gives a good actionable number.  You can know if 
you're doing better or worse than you were before.  Even better is the 
derivative of cache miss count.

Overall SchemaMetrics cost HBase about 10% of it's performance and I just don't 
think the enough people got enough out of it to keep per cf per block type 
metrics.

Maybe we should show the percentage to more decimal figures so that it's more 
obvious that there are some misses?  But overall while the UI is nice it's not 
what should be used for figuring these things out.  That should be done by your 
metrics system (CM, Ganglia, OpenTSDB, etc).

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore

2013-06-26 Thread Amitanand Aiyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694319#comment-13694319
 ] 

Amitanand Aiyer commented on HBASE-8228:


I think open source is using one memstore flusher.

Multiple memstore flush threads was added pretty recently. This was part of the 
efforts to reduce the time it takes to send machines to repairs etc. 

 Investigate time taken to snapshot memstore
 ---

 Key: HBASE-8228
 URL: https://issues.apache.org/jira/browse/HBASE-8228
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 Snapshotting memstores is normally quick. But, sometimes it seems to take 
 long. This JIRA is to track the investigation and fix to improve the outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library

2013-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694332#comment-13694332
 ] 

Hadoop QA commented on HBASE-7411:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589796/7411v4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestHCM
  org.apache.hadoop.hbase.security.access.TestAccessController

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:475)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6151//console

This message is automatically generated.

 Use Netflix's Curator zookeeper library
 ---

 Key: HBASE-7411
 URL: https://issues.apache.org/jira/browse/HBASE-7411
 Project: HBase
  Issue Type: New Feature
  Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.2

 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 
 7411v4.txt, hbase-7411_v0.patch


 We have mentioned using the Curator library 
 (https://github.com/Netflix/curator) elsewhere but we can continue the 
 discussion in this.  
 The advantages for the curator lib over ours are the recipes. We have very 
 similar retrying mechanism, and we don't need much of the nice client-API 
 layer. 
 We also have similar Listener interface, etc. 
 I think we can decide on one of the following options: 
 1. Do not depend on curator. We have some of the recipes, and some custom 
 recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, 
 etc). We can also copy / fork some code from there.
 2. Replace all of our zk usage / connection management to curator. We may 
 keep the current set of API's as a thin wrapper. 
 3. Use our own connection management / retry logic, and build a custom 
 CuratorFramework implementation for the curator recipes. This will keep the 
 current zk logic/code intact, and allow us to use curator-recipes as we see 
 fit. 
 4. Allow both curator and our zk layer to manage the connection. We will 
 still have 1 connection, but 2 abstraction layers sharing it. This is the 
 easiest to implement, but a freak show? 
 I have a patch for 4, and now prototyping 2 

[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694336#comment-13694336
 ] 

Varun Sharma commented on HBASE-8370:
-

We don't have per block type metrics in trunk/95 because the overall cache hit 
percentage is a good proxy for data block cache percent. Yes the overall number 
is higher but it still gives a good actionable number. You can know if you're 
doing better or worse than you were before. Even better is the derivative of 
cache miss count.

I am not sure this is true - this number is always 99 % for us on all clusters 
- blockCacheHitCachingRation - how can a number which never changes, ever be 
actionable ? Even with decimal numbers, its never going to change because the 
index blocks are going to take over

Also, the different b/w 82 % cache hit ratio to 99 % cache hit ratio is 
enormous. Controlling you p80 on latency is a *lot* easier than your p99. A 
cache hit ratio of 99 % just sends you this false sense of security that you 
have controlled your p99 latency. This is important for online serving, maynot 
be for enterprise.

I guess, we don't need to bring back SchemaMetrics to fix this but we can have 
block level metrics. At least I want to be sure that Index blocks have 100 % 
cache hit rates because if that's not happening, then I am in a bad situation. 
It would be better to not have folks using HBase for online storage, play a 
guessing game, as to what the true effectiveness of the cache is.


 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694343#comment-13694343
 ] 

Elliott Clark commented on HBASE-8370:
--

bq.this number is always 99 % for us on all clusters
That's why I said we need more decimal places for it.

bq.Also, the different b/w 82 % cache hit ratio to 99 % cache hit ratio is 
enormous.
But that 82% doesn't tell you anything all by itself.  For a given work load is 
80% good or bad.  You don't know.  That percentage is really only useful if you 
have a base line so it's equally informative uf the cache percentage to go from 
99 and then falls to 98 or if it's 84 and falls to 83.  Additionally gauges are 
bad.  They just don't tell a great story.  There's a lot of lossy data there, 
sampling times can skew your picture of what's actually happening.  See 
[~phobos182]'s slides (https://speakerdeck.com/phobos182/metrics-at-pinterest) 
on why you should prefer counters over gauges. 

That's why I said that derivative of cache miss count is the best way to look 
at cache efficacy.  It gives you an accurate count of the number of times you 
have to go to hdfs (not really disk since there can be os cache there).  It 
also provides a good way to compare today to yesterday.

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow

2013-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694357#comment-13694357
 ] 

Hudson commented on HBASE-8802:
---

Integrated in hbase-0.95-on-hadoop2 #150 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/150/])
HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497051)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java


 totalCompactingKVs overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694361#comment-13694361
 ] 

Varun Sharma commented on HBASE-8370:
-

Having a cache hit ratio of 80 % means that at least 80 % of my requests are 
fast (assuming GC out of picture) - in the current scenario, it may map to a 
number like 99.9 % and tomorrow if I had 0 % cache hits for data blocks, the 
number comes down to 99.5 % - I am able to calculate this based on the numbers 
I paste above. It assumes a certain distribution b/w number of accesses to 
Index blocks and Data blocks. Tomorrow, if the distribution changes, it may 
well be that 99.5 % overall cache hit ratio corresponds to 90 % hit rate on 
data blocks. So, I don't think that Overall cache hit ratio is a good proxy 
for Data block cache hit ratio.

As far as derivatives go, Miss count derivative can go up with other things 
like read request count - so now we would also need to do a derivate on that 
counter and compare etc. On 0.94, that number has been overflowing for us all 
the time and is -ve, is that being fixed in trunk ?

I dont think this is about counters vs gauges. I am fine with exposing counters 
per block type. Right now, I just don't have any insight into the block cache 
which plays an important role in serving reads. When a compaction happens and 
new files are written, I dont know the number of cache misses for Index block 
vs Data block vs Bloom block. I would no longer know how many Data blocks are 
being accessed and how many Index blocks etc


 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694377#comment-13694377
 ] 

Elliott Clark commented on HBASE-8370:
--

bq.Having a cache hit ratio of 80 % means that at least 80 % of my requests are 
fast
I would disagree. 

* Full handlers
* Giant gets of large amounts of data.
* Gets without a proper bloom filter.
* Things that skip past lots of (cached) blocks
* Slow data block encoding
* slow filters
* slow network
* lock contention
* GC

There are TONS of other reason that your requests can be slow.  And without 
knowing the work load you can't tell if cache miss is more or less likely than 
any other explanation.  I've seen workloads where the cache percent was in the 
low teens and I've seen workloads where the cache percent was really 100%.  
There's no way a priori to know if a number is good or bad.  So you again are 
back to using the metrics with a base line and comparing them.  For that the 
absolute numbers are less important.


bq.As far as derivatives go, Miss count derivative can go up with other things 
like read request count
Yep and that makes things harder but the only thing that's not susceptible are 
gauges.  And like I said before I'm trying to move us off of gauges.

bq.I dont know the number of cache misses for Index block vs Data block vs 
Bloom block. I would no longer know how many Data blocks are being accessed and 
how many Index blocks etc
But those aren't actionable metrics.  

* If your bloom block cache hit count goes down you can do... Not much. Not 
worth counting if you can't take action on it.
* With the way the index blocks works you can't cache miss them, after the 
first time, unless we're oom (they aren't ever evicted, even if you turn off 
caching the cf).  So you'll see that there are some misses on region open, and 
anytime there's a new flush or compaction. So it will be 100%.  Compaction and 
flush metrics are much more useful here for determining this kind of thing, so 
there's no need to add more metrics for something that's better covered 
somewhere else.
* So data blocks are the only useful one.  and they dominate the number of 
blocks requested. So this can pretty well be covered by the following.
** blockCacheExpressHitPercent
** blockCountHitPercent
** blockCacheHitCount
** blockCacheMissCount

I'm -1 adding any more metrics on the read path unless there's something that's 
totally missed (Jeremy brought up a couple the last time I met with him).  That 
code is just too important to be instrumented any more for things that can be 
figured out other ways (and I would argue better ways but that's less 
important).

I'm +1 on making that cache hit percent a double so there's more accuracy.

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8496) Implement tags and the internals of how a tag should look like

2013-06-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694379#comment-13694379
 ] 

Ted Yu commented on HBASE-8496:
---

On page 6:
bq. In case of per HFile
The sentence seems to be incomplete.

bq. once we close the file we add the Meta data saying tagpresent = true and 
avg_tag_len = 0.
avg_tag_len = 0 would indicate that there is no tag present. Why do we need two 
flags (tagpresent and avg_tag_len) ?
Later compaction is mentioned where tagpresent is changed to false. But we 
should be able to achieve this at the time of flush, right ?

{code}
byte[] tagArray = kv.getTagsArray();
Tag decodeTag = KeyValueUtil.decodeTag(tagArray);
{code}
In the above sample, I would expect decodeTag() to return more than one Tag.
Would all Tags in the KeyValue be returned to filterKeyValue() ? I think it 
would be better if Tag.Type.Visibility is passed to decodeTag() so that only 
visibility Tag is returned.

 Implement tags and the internals of how a tag should look like
 --

 Key: HBASE-8496
 URL: https://issues.apache.org/jira/browse/HBASE-8496
 Project: HBase
  Issue Type: New Feature
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: Tag design.pdf


 The intent of this JIRA comes from HBASE-7897.
 This would help us to decide on the structure and format of how the tags 
 should look like. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver

2013-06-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694384#comment-13694384
 ] 

Ted Yu commented on HBASE-8790:
---

Integrated to 0.95 and trunk.

Thanks for the patch, Liang.

Thanks for the review, Ram.

 NullPointerException thrown when stopping regionserver
 --

 Key: HBASE-8790
 URL: https://issues.apache.org/jira/browse/HBASE-8790
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.1
 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3
Reporter: Xiong LIU
Assignee: Liang Xie
 Attachments: HBase-8790.txt


 The Hbase cluster is a fresh start with one regionserver.
 When we stop hbase, an unhandled NullPointerException is throwed in the 
 regionserver.
 The regionserver's log is as follows:
 2013-06-21 10:21:11,284 INFO  [regionserver61020] regionserver.HRegionServer: 
 Closing user regions
 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: 
 Waiting on 1028785192
 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: 
 ABORTING region server HOSTNAME_TEST,61020,1371781086817
 : Unhandled: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: 
 RegionServer abort: loaded coprocessors are: [org.apache
 .hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-06-21 10:21:14,293 INFO  [regionserver61020] regionserver.HRegionServer: 
 STOPPED: Unhandled: null
 2013-06-21 10:21:14,293 INFO  [regionserver61020] ipc.RpcServer: Stopping 
 server on 61020
 It seems that after closing user regions, the rssStub is null.
 update:
 we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or 
 other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) 
 in hbase-site.xml, the regionserver is continuously attempting connect to 
 master. and if we stop hbase, the above NullPointerException occurred. With 
 hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8790) NullPointerException thrown when stopping regionserver

2013-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8790:
--

Fix Version/s: 0.95.2
   0.98.0
 Hadoop Flags: Reviewed

 NullPointerException thrown when stopping regionserver
 --

 Key: HBASE-8790
 URL: https://issues.apache.org/jira/browse/HBASE-8790
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.1
 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3
Reporter: Xiong LIU
Assignee: Liang Xie
 Fix For: 0.98.0, 0.95.2

 Attachments: HBase-8790.txt


 The Hbase cluster is a fresh start with one regionserver.
 When we stop hbase, an unhandled NullPointerException is throwed in the 
 regionserver.
 The regionserver's log is as follows:
 2013-06-21 10:21:11,284 INFO  [regionserver61020] regionserver.HRegionServer: 
 Closing user regions
 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: 
 Waiting on 1028785192
 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: 
 ABORTING region server HOSTNAME_TEST,61020,1371781086817
 : Unhandled: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: 
 RegionServer abort: loaded coprocessors are: [org.apache
 .hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-06-21 10:21:14,293 INFO  [regionserver61020] regionserver.HRegionServer: 
 STOPPED: Unhandled: null
 2013-06-21 10:21:14,293 INFO  [regionserver61020] ipc.RpcServer: Stopping 
 server on 61020
 It seems that after closing user regions, the rssStub is null.
 update:
 we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or 
 other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) 
 in hbase-site.xml, the regionserver is continuously attempting connect to 
 master. and if we stop hbase, the above NullPointerException occurred. With 
 hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694389#comment-13694389
 ] 

Varun Sharma commented on HBASE-8370:
-

We can make the hit percent a double.

But if we never evict index blocks, one option is to only count DataBlocks for 
HitPercent, CacheHitCount, CacheMissCount. I know that is not the case for 
0.94. Is that the case for trunk or can we change these metrics to only 
instrument data blocks then ?

Anyone else have opinions ?

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow

2013-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694393#comment-13694393
 ] 

Hudson commented on HBASE-8802:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #585 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/585/])
HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497050)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java


 totalCompactingKVs overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95

2013-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694394#comment-13694394
 ] 

Hudson commented on HBASE-8799:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #585 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/585/])
HBASE-8799 TestAccessController#testBulkLoad has been failing for some time 
on trunk/0.95 -- ADDING DEBUG (Revision 1497123)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java


 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
 --

 Key: HBASE-8799
 URL: https://issues.apache.org/jira/browse/HBASE-8799
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, security, test
Affects Versions: 0.98.0, 0.95.2
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8799.txt


 I've observed this in Jenkins reports and also while I was working on 
 HBASE-8692, only on trunk/0.95, not on 0.94:
 {quote}
 Failed tests:   
 testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): 
 Expected action to pass for user 'rwuser' but was denied
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8410) Basic quota support for namespaces

2013-06-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694401#comment-13694401
 ] 

Ted Yu commented on HBASE-8410:
---

For NamespaceController, please add class javadoc and audience annotation.
{code}
+zkManager = new ZKNamespaceManager(zk);
+zkManager.start();
{code}
I saw the following in TableNamespaceManager:
{code}
+zkNamespaceManager = new ZKNamespaceManager(masterServices.getZooKeeper());
+zkNamespaceManager.start();
{code}
So we may have more than one ZKNamespaceManager running in the same JVM ?

The spacing is slightly off: two spaces should be used per indentation.
{code}
+maxRegions = Long.parseLong(value);
+  } catch (NumberFormatException exp) {
+throw new ConstraintException(NumberFormatException while getting max 
regions., exp);
{code}
Please include value in exception message.
{code}
+currentStatus = 
getNamespaceQuota(ctx.getEnvironment().getConfiguration(),
+  nspdesc.getName());
{code}
I think there should be a better name for getNamespaceQuota because quota 
should be the setting governing the namespace which is different from the 
current status for the underlying namespace.

I wonder if using table and region count is a good way for enforcing quota 
because the underlying region size can vary.
{code}
++  is not allowed to have  + regions.length
++  number of regions. The total number of regions permitted are 
only 
{code}
Remove 'number of '. 'permitted are only' - 'permitted is only'.

For class NamespaceQuota, please add javadoc and audience annotation. I think 
the class should be renamed because it reflects the status of a namespace, not 
its quota.

 Basic quota support for namespaces
 --

 Key: HBASE-8410
 URL: https://issues.apache.org/jira/browse/HBASE-8410
 Project: HBase
  Issue Type: Sub-task
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Attachments: HBASE_8410.patch


 This task involves creating an observer which provides basic quota support to 
 namespaces in terms of (1) number of tables and (2) number of regions. The 
 quota support can be enabled by setting:
 property
 namehbase.coprocessor.region.classes/name
 valueorg.apache.hadoop.hbase.namespace.NamespaceController/value
 /property
 property
 namehbase.coprocessor.master.classes/name
 valueorg.apache.hadoop.hbase.namespace.NamespaceController/value
 /property
 in the hbase-site.xml.
 To add quotas to namespace, while creating namespace properties need to be 
 added.
 Examples:
 1. namespace_create 'ns1', {'hbase.namespace.quota.maxregion'='10'}
 2. 1. namespace_create 'ns2', {'hbase.namespace.quota.maxtables'='2'}, 
 {'hbase.namespace.quota.maxregion'='5'}
 The quotas can be modified/added to namespace at any point of time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8802) totalCompactingKVs may overflow

2013-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8802:
--

Fix Version/s: 0.95.2
   0.98.0
  Summary: totalCompactingKVs may overflow  (was: totalCompactingKVs 
overflow)
 Hadoop Flags: Reviewed

 totalCompactingKVs may overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8802) totalCompactingKVs may overflow

2013-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-8802:
-

Assignee: Chao Shi

 totalCompactingKVs may overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Assignee: Chao Shi
Priority: Trivial
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8811) REST service ignores misspelled check= parameter, causing unexpected mutations

2013-06-26 Thread Chip Salzenberg (JIRA)
Chip Salzenberg created HBASE-8811:
--

 Summary: REST service ignores misspelled check= parameter, 
causing unexpected mutations
 Key: HBASE-8811
 URL: https://issues.apache.org/jira/browse/HBASE-8811
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 0.95.1
Reporter: Chip Salzenberg
Priority: Critical


In rest.RowResource.update(), this code keeps executing a request if a 
misspelled check= parameter is provided.
{noformat}
if (CHECK_PUT.equalsIgnoreCase(check)) {
  return checkAndPut(model);
} else if (CHECK_DELETE.equalsIgnoreCase(check)) {
  return checkAndDelete(model);
} else if (check != null  check.length()  0) {
  LOG.warn(Unknown check value:  + check + , ignored);
}
{noformat}

By my reading of the code, this results in the provided cell value that was 
intended as a check instead being treated as a mutation, which is sure to 
destroy user data.  Thus the priority of this bug, as it can cause corruption.

I suggest that a better reaction than a warning would be, approximately:

{noformat}
return Response.status(Response.Status.BAD_REQUEST)
.type(MIMETYPE_TEXT).entity(Invalid check value ' + check + ')
.build();
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694433#comment-13694433
 ] 

Varun Sharma commented on HBASE-8370:
-

Also, coming back to the point about the metrics being actionable.

RE:If your bloom block cache hit count goes down you can do... Not much. 
Not worth counting if you can't take action on it.

I disagree that its not actionable. I would go fix the block cache in this 
case. It means there is something seriously wrong with our implementation of 
the block cache if we are evicting bloom blocks - maybe its just me but I feel 
we should not be evicting bloom blocks.

- If the cache hit rate is too low on Data Blocks, the action item is to 
increase Block Cache amount.

I would agree that index block metrics are not needed or actionable if it is 
indeed the case that we pin index blocks forever.





 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums

2013-06-26 Thread Fengdong Yu (JIRA)
Fengdong Yu created HBASE-8812:
--

 Summary: Avoid a wide line on the HMaster webUI if we have more 
zookeeper quorums
 Key: HBASE-8812
 URL: https://issues.apache.org/jira/browse/HBASE-8812
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Fengdong Yu
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums

2013-06-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HBASE-8812:
---

Description: add a line break for every four zookeeper quorums on the 
HMaster webUI.

 Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
 

 Key: HBASE-8812
 URL: https://issues.apache.org/jira/browse/HBASE-8812
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Fengdong Yu
Priority: Minor

 add a line break for every four zookeeper quorums on the HMaster webUI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums

2013-06-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HBASE-8812:
---

Status: Patch Available  (was: Open)

 Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
 

 Key: HBASE-8812
 URL: https://issues.apache.org/jira/browse/HBASE-8812
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Fengdong Yu
Priority: Minor
 Attachments: HBASE-8812.patch


 add a line break for every four zookeeper quorums on the HMaster webUI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver

2013-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694442#comment-13694442
 ] 

Hudson commented on HBASE-8790:
---

Integrated in hbase-0.95 #271 (See 
[https://builds.apache.org/job/hbase-0.95/271/])
HBASE-8790 NullPointerException thrown when stopping regionserver (Liang 
Xie) (Revision 1497172)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 NullPointerException thrown when stopping regionserver
 --

 Key: HBASE-8790
 URL: https://issues.apache.org/jira/browse/HBASE-8790
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.1
 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3
Reporter: Xiong LIU
Assignee: Liang Xie
 Fix For: 0.98.0, 0.95.2

 Attachments: HBase-8790.txt


 The Hbase cluster is a fresh start with one regionserver.
 When we stop hbase, an unhandled NullPointerException is throwed in the 
 regionserver.
 The regionserver's log is as follows:
 2013-06-21 10:21:11,284 INFO  [regionserver61020] regionserver.HRegionServer: 
 Closing user regions
 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: 
 Waiting on 1028785192
 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: 
 ABORTING region server HOSTNAME_TEST,61020,1371781086817
 : Unhandled: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: 
 RegionServer abort: loaded coprocessors are: [org.apache
 .hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-06-21 10:21:14,293 INFO  [regionserver61020] regionserver.HRegionServer: 
 STOPPED: Unhandled: null
 2013-06-21 10:21:14,293 INFO  [regionserver61020] ipc.RpcServer: Stopping 
 server on 61020
 It seems that after closing user regions, the rssStub is null.
 update:
 we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or 
 other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) 
 in hbase-site.xml, the regionserver is continuously attempting connect to 
 master. and if we stop hbase, the above NullPointerException occurred. With 
 hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums

2013-06-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HBASE-8812:
---

Attachment: HBASE-8812.patch

 Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
 

 Key: HBASE-8812
 URL: https://issues.apache.org/jira/browse/HBASE-8812
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Fengdong Yu
Priority: Minor
 Attachments: HBASE-8812.patch


 add a line break for every four zookeeper quorums on the HMaster webUI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums

2013-06-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HBASE-8812:
---

Description: 
add a line break for every four zookeeper quorums on the HMaster webUI.

I don't think this need a test case. just manual testing is enough. I've tested 
on my testing cluster. everything works well.

  was:add a line break for every four zookeeper quorums on the HMaster webUI.


 Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
 

 Key: HBASE-8812
 URL: https://issues.apache.org/jira/browse/HBASE-8812
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Fengdong Yu
Priority: Minor
 Attachments: HBASE-8812.patch


 add a line break for every four zookeeper quorums on the HMaster webUI.
 I don't think this need a test case. just manual testing is enough. I've 
 tested on my testing cluster. everything works well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju reassigned HBASE-8808:
-

Assignee: Manukranth Kolloju

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial

 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-8808 started by Manukranth Kolloju.

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial

 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-8808:
--

Attachment: Screen Shot 2013-06-25 at 11.35.30 AM.png

Attaching a sample report of the test coverage in out test suite.

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png


 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694460#comment-13694460
 ] 

Manukranth Kolloju commented on HBASE-8808:
---

The unit test coverage doesn't take too long. Roughly the unit test execution 
time is increased by 10-20%. Otherwise, the report generation is very short 
once the execution data is collected. A very nice feature of this tool is that 
it shows us the branch coverage unlike other code coverage tools that are out 
there.
I can create a patch and port it to the open source branch if people are 
interested in this tool and if it is possible.

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png


 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju resolved HBASE-8808.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 0.89-fb

 Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png

   Original Estimate: 24h
  Remaining Estimate: 24h

 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-8808:
--

   Component/s: build
 Affects Version/s: 0.89-fb
 Fix Version/s: 0.89-fb
Remaining Estimate: 24h
 Original Estimate: 24h

 Use Jacoco to generate Unit Test coverage reports
 -

 Key: HBASE-8808
 URL: https://issues.apache.org/jira/browse/HBASE-8808
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 0.89-fb

 Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png

   Original Estimate: 24h
  Remaining Estimate: 24h

 Enabling the code coverage tool jacoco in maven

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8491) Fixing the TestHeapSizes.

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju resolved HBASE-8491.
---

Resolution: Fixed

 Fixing the TestHeapSizes.
 -

 Key: HBASE-8491
 URL: https://issues.apache.org/jira/browse/HBASE-8491
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 0.89-fb


 Accounting for the extra references added. Did an absolute count of non 
 static variables and updated accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8491) Fixing the TestHeapSizes.

2013-06-26 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju reassigned HBASE-8491:
-

Assignee: Manukranth Kolloju

 Fixing the TestHeapSizes.
 -

 Key: HBASE-8491
 URL: https://issues.apache.org/jira/browse/HBASE-8491
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 0.89-fb


 Accounting for the extra references added. Did an absolute count of non 
 static variables and updated accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >