date:20130626


[ 
https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693688#comment-13693688
 ] 

Elliott Clark commented on HBASE-8789:
--

It's still 0.  I think the 2 you're seeing is the field index in protobuf's.

I fill out the rpcVersion with RPC_CURRENT_VERSION which seems to still be 0.

 Add max RPC version to meta-region-server zk node.
 --

 Key: HBASE-8789
 URL: https://issues.apache.org/jira/browse/HBASE-8789
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC, Zookeeper
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch


 For clients to boot strap themselves they need to know the max rpc version 
 that the meta server will accept.  We should add that to the zookeeper node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Varun Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693798#comment-13693798
]

Varun Sharma commented on HBASE-8370:
-

Here are some stats for this JIRA - I am arguing that the BlockCacheHit ratio
number reported on a region server does not mean much.

tbl.feeds.cf.home.bt.Index.fsBlockReadCnt : 46864,
tbl.feeds.cf.home.bt.Index.fsBlockReadCacheHitCnt : 46864

Index Block cache hit ratio = 100 %

tbl.feeds.cf.home.bt.Data.fsBlockReadCacheHitCnt : 202
tbl.feeds.cf.home.bt.Data.fsBlockReadCnt : 247

Data Block cache hit ratio = 82 %

Overall Cache hit ration = (46864 + 202) / (46864 + 247) = 99 %

Since Indexes are hit often, cache hits are 100 % and also # of hits is high.
The real number that we are concerned about, is 82 % which is hit rate on the
data block. However, we continue to show the # 99 % on the region server
console instead. I think we need to fix that number. Please let me know if
folks object to this ?

Report data block cache hit rates apart from aggregate cache hit rates
--

Key: HBASE-8370
URL: https://issues.apache.org/jira/browse/HBASE-8370
Project: HBase
Issue Type: Improvement
Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

Attaching from mail to d...@hbase.apache.org
I am wondering whether the HBase cachingHitRatio metrics that the region
server UI shows, can get me a break down by data blocks. I always see this
number to be very high and that could be exagerated by the fact that each
lookup hits the index blocks and bloom filter blocks in the block cache
before retrieving the data block. This could be artificially bloating up the
cache hit ratio.
Assuming the above is correct, do we already have a cache hit ratio for data
blocks alone which is more obscure ? If not, my sense is that it would be
pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow

2013-06-26 Thread Chao Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693838#comment-13693838
 ] 

Chao Shi commented on HBASE-8802:
-

It seems like TestAccessController failure has nothing to do with this patch. 
It still fails when I ran it on my box without this patch.

 totalCompactingKVs overflow
 ---

 Key: HBASE-8802
 URL: https://issues.apache.org/jira/browse/HBASE-8802
 Project: HBase
  Issue Type: Bug
Reporter: Chao Shi
Priority: Trivial
 Attachments: hbase-8802.patch


 I happened to get a very large region (mistakely bulk loading tons of HFile 
 into a signle region). When it's getting compacted, the webUI shows a 
 overflow totalCompactingKVs. I found this is due to 
 Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this 
 variable is only used for displaying compaction progress and everywhere else 
 uses long.
 totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8496) Implement tags and the internals of how a tag should look like

2013-06-26 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8496:
--

Attachment: Tag design.pdf

Attaching a simple design document that says how tags will be supported by 
HBase and the advantage of using KeyValuecodec. 
It also touches on the way how tags can be implemented in an optional way when 
we don't go with KeyValuecodec.  Pls feel free to share your comments/reviews.  
Thanks to Andy and Anoop for their reviews/suggestions.

 Implement tags and the internals of how a tag should look like
 --

 Key: HBASE-8496
 URL: https://issues.apache.org/jira/browse/HBASE-8496
 Project: HBase
  Issue Type: New Feature
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: Tag design.pdf


 The intent of this JIRA comes from HBASE-7897.
 This would help us to decide on the structure and format of how the tags 
 should look like. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0


[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693955#comment-13693955
 ] 

Jean-Marc Spaggiari commented on HBASE-8793:


Hi [~stack], Thanks for that! However, I tried to close 8803 or to modify one 
comment, or change the status, but I'm still not able. I can modify the few 
attributs, but not the resolution nor the status. Same in this defect, not able 
to modify those fields.

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8716) Fixups/Improvements for graceful_stop.sh/region_mover.rb


[ 
https://issues.apache.org/jira/browse/HBASE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693965#comment-13693965
 ] 

Jean-Marc Spaggiari commented on HBASE-8716:


I did some changes (//) which are making it 5 times faster. Might be even 
faster on big clusters. I will push the patch today for comments.

 Fixups/Improvements for graceful_stop.sh/region_mover.rb
 

 Key: HBASE-8716
 URL: https://issues.apache.org/jira/browse/HBASE-8716
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.2

 Attachments: 8716.txt


 It is a while since these scripts were touched.  Giving them a spring 
 cleaning and seeing if can make them return error codes on failure (seems 
 like style previous was that the operator would watch the output and react to 
 it but I see cases where tools want to call these scripts and they want 
 return code to indicate whether the rolling upgrade worked or not).  Also, 
 see if can make the rolling restart faster since one-by-one while minimally 
 disruptive and 'safe', it is slow one clusters of hundreds of nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0


[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694006#comment-13694006
 ] 

stack commented on HBASE-8793:
--

Yet...

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0


[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694005#comment-13694005
 ] 

stack commented on HBASE-8793:
--

[~jmspaggi] Try again (smile); I added you to the 'committer's list though you 
ain't..

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates


[ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694007#comment-13694007
 ] 

stack commented on HBASE-8370:
--

[~eclark] Mighty Elliott... input?

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8789) Add max RPC version to meta-region-server zk node.


[ 
https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694008#comment-13694008
 ] 

stack commented on HBASE-8789:
--

Duh.  Yes.  +1 on patch.

 Add max RPC version to meta-region-server zk node.
 --

 Key: HBASE-8789
 URL: https://issues.apache.org/jira/browse/HBASE-8789
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC, Zookeeper
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch


 For clients to boot strap themselves they need to know the max rpc version 
 that the meta server will accept.  We should add that to the zookeeper node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Affects Version/s: 0.98.0
   0.94.8
   0.95.1

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0


[ 
https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694015#comment-13694015
 ] 

Jean-Marc Spaggiari commented on HBASE-8793:


;) Thanks for the yet ;) I just retried and sent you an email about the 
result.

 Regionserver ubuntu's startup script return code always 0
 -

 Key: HBASE-8793
 URL: https://issues.apache.org/jira/browse/HBASE-8793
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.6
 Environment: Description:Ubuntu 12.04.2 LTS
 Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0
Reporter: Michael Czerwiński
Assignee: Jean-Marc Spaggiari
Priority: Minor

 hbase-regionserver startup script always returns 0 (exit 0 at the end of the 
 script) this is wrong behaviour which causes issues when trying to recognise 
 true status of the service.
 Replacing it with 'exit $?' seems to fix the problem, looking at hbase master 
 return codes are assigned to RETVAL variable which is used with exit.
 Not sure if the problem exist in other versions.
  /etc/init.d/hbase-regionserver.orig status
 hbase-regionserver is not running.
  echo $?
 After fix:
  /etc/init.d/hbase-regionserver status
 hbase-regionserver is not running.
  echo $?
 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore

2013-06-26 Thread Amitanand Aiyer (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694028#comment-13694028
]

Amitanand Aiyer commented on HBASE-8228:

Looks like this is caused when we are using multiple memstore-flusher threads;
and two flush requests have a log-roll between them.

to flush a region, we grab the HRegion.updatesLock.writeLock, and then try to
grab the HLog.cacheFlushLock.readLock(). Most of the operation that happens
within the lock, is done in memory, so this should have been a short duration.
... unless, we are waiting to grab the lock.

HLog.rollWriter tries to grab the HLog.cacheFlushLock.writeLock(). This means
that a Log-roll cannot happen when a flush is already in progress.

If a second flush were to be initiated, when there is already a flush going on,
and there is a log-roll, waiting (for a writer's lock); then
the second flush, is able to get the HRegion.updatesLock.writeLock (presumably,
for a different region). But, will stall on the HLog.cacheFlushLock.readLock().
This is because the ReaderWriterLock implementation, which uses the
NonFairSync() will cause the reader locks to wait on the writer's request; if
the writer is at the head of the queue.

This interleaving results in the second flush request, holding the
HRegion.updatesLock.writeLock() for as long as the first thread took to flush a
region + do a log roll.

Swapping the order of the HRegion.updatesLock.writeLock(), and startCacheFlush
should probably fix this issue. Reducing the # of memstore flusher threads to
= 1 can also stop this behavior.

Investigate time taken to snapshot memstore
---

Key: HBASE-8228
URL: https://issues.apache.org/jira/browse/HBASE-8228
Project: HBase
Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
Fix For: 0.89-fb

Snapshotting memstores is normally quick. But, sometimes it seems to take
long. This JIRA is to track the investigation and fix to improve the outliers.

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Attachment: HBASE-8803-v0-trunk.patch

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

2013-06-26 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694032#comment-13694032
]

Andrew Purtell commented on HBASE-8370:
---

bq. I think we need to fix that number.

Configuration settings can change how we handle the different block types on a
per type basis.That's only half the story if we do not have per-block-type
metrics too.

Report data block cache hit rates apart from aggregate cache hit rates
--

Key: HBASE-8370
URL: https://issues.apache.org/jira/browse/HBASE-8370
Project: HBase
Issue Type: Improvement
Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Status: Patch Available  (was: Open)

So here is what I did.

First, when regions are unloaded from the server to be moved to other servers. 
Instead of doing that region by region and randomly, it's now doing that in 
round dobin mode, assigning one region per RS. So if there is 20 RS in the 
cluster, one beeing unloaded, it will move the regions 19 by 19!

Then to restore the regions, instead of doing that one by one, it's now going 
that 10 by 10.

As a result, the rolling-restart now takes 16 minutes in my cluster instead of 
74 minutes. And the bigger the cluster is, the faster it will be.

This version is for review only. Open to comments. I have tested it on 0.94, 
but I don't have a cluster running with Trunk, so I'm not able to test it...

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1, 0.94.8, 0.98.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates


 [ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-8370:
--


 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates


 [ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-8370.
--

Resolution: Fixed

 Report data block cache hit rates apart from aggregate cache hit rates
 --

 Key: HBASE-8370
 URL: https://issues.apache.org/jira/browse/HBASE-8370
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Minor

 Attaching from mail to d...@hbase.apache.org
 I am wondering whether the HBase cachingHitRatio metrics that the region 
 server UI shows, can get me a break down by data blocks. I always see this 
 number to be very high and that could be exagerated by the fact that each 
 lookup hits the index blocks and bloom filter blocks in the block cache 
 before retrieving the data block. This could be artificially bloating up the 
 cache hit ratio.
 Assuming the above is correct, do we already have a cache hit ratio for data 
 blocks alone which is more obscure ? If not, my sense is that it would be 
 pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari reopened HBASE-8803:



 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8803:
-

Component/s: Usability

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time


[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694044#comment-13694044
 ] 

stack commented on HBASE-8803:
--

Can you make this new behavior optional?  Or at least how many to do at a time 
a parameter (max concurrent threads moving and restoring)?

Folks want the fast rolling restart for sure -- I know for a fact that a few of 
your customers need it (smile) -- but the good thing about the old behavior is 
that it was minimally disruptive (though slow) so is good for a cluster that is 
doing hard serving.

Thanks for working on this important operational issue [~jmspaggi]

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore

2013-06-26 Thread Himanshu Vashishtha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694064#comment-13694064
 ] 

Himanshu Vashishtha commented on HBASE-8228:


bq. Swapping the order of the HRegion.updatesLock.writeLock(), and 
startCacheFlush should probably fix this issue.
So, you don't lock the region until you get the cacheFlushLock.readLock(). 

In 0.94, cacheFlushLock is still a ReentrantLock. I wonder whether multiple 
memstore flush threads help there at all (if multiple flushers are there in 
0.94).
In trunk, we no longer write flush events to hlog. Basically, a flush can 
happen while log rolling is going on.


 Investigate time taken to snapshot memstore
 ---

 Key: HBASE-8228
 URL: https://issues.apache.org/jira/browse/HBASE-8228
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb


 Snapshotting memstores is normally quick. But, sometimes it seems to take 
 long. This JIRA is to track the investigation and fix to improve the outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time

[
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694070#comment-13694070
]

Hadoop QA commented on HBASE-8803:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12589763/HBASE-8803-v0-trunk.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:

org.apache.hadoop.hbase.security.access.TestAccessController

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/6146//console

This message is automatically generated.

region_mover.rb should move multiple regions at a time
--

Key: HBASE-8803
URL: https://issues.apache.org/jira/browse/HBASE-8803
Project: HBase
Issue Type: Bug
Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Attachments: HBASE-8803-v0-trunk.patch

Original Estimate: 48h
Remaining Estimate: 48h

When there is many regions in a cluster, rolling_restart can take hours
because region_mover is moving the regions one by one.

[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver

2013-06-26 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694087#comment-13694087
 ] 

ramkrishna.s.vasudevan commented on HBASE-8790:
---

+1 on patch.

 NullPointerException thrown when stopping regionserver
 --

 Key: HBASE-8790
 URL: https://issues.apache.org/jira/browse/HBASE-8790
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.1
 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3
Reporter: Xiong LIU
Assignee: Liang Xie
 Attachments: HBase-8790.txt


 The Hbase cluster is a fresh start with one regionserver.
 When we stop hbase, an unhandled NullPointerException is throwed in the 
 regionserver.
 The regionserver's log is as follows:
 2013-06-21 10:21:11,284 INFO  [regionserver61020] regionserver.HRegionServer: 
 Closing user regions
 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: 
 Waiting on 1028785192
 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: 
 ABORTING region server HOSTNAME_TEST,61020,1371781086817
 : Unhandled: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: 
 RegionServer abort: loaded coprocessors are: [org.apache
 .hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-06-21 10:21:14,293 INFO  [regionserver61020] regionserver.HRegionServer: 
 STOPPED: Unhandled: null
 2013-06-21 10:21:14,293 INFO  [regionserver61020] ipc.RpcServer: Stopping 
 server on 61020
 It seems that after closing user regions, the rssStub is null.
 update:
 we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or 
 other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) 
 in hbase-site.xml, the regionserver is continuously attempting connect to 
 master. and if we stop hbase, the above NullPointerException occurred. With 
 hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks


 [ 
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8737:
-

Attachment: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch

Here is a patch that does both sides.  Trying against hadoopqa.

 [replication] Change replication RPC to use cell blocks
 ---

 Key: HBASE-8737
 URL: https://issues.apache.org/jira/browse/HBASE-8737
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
 Fix For: 0.95.2

 Attachments: 
 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt


 Currently, the replication rpc that ships edits simply dumps the byte value 
 of WAL edit key/value pairs into a protobuf message.
 Modify the replication rpc mechanism to use cell blocks so it can leverage 
 encoding and compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks


 [ 
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8737:
-

Fix Version/s: 0.98.0
   Status: Patch Available  (was: Open)

 [replication] Change replication RPC to use cell blocks
 ---

 Key: HBASE-8737
 URL: https://issues.apache.org/jira/browse/HBASE-8737
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
 Fix For: 0.98.0, 0.95.2

 Attachments: 
 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt


 Currently, the replication rpc that ships edits simply dumps the byte value 
 of WAL edit key/value pairs into a protobuf message.
 Modify the replication rpc mechanism to use cell blocks so it can leverage 
 encoding and compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8737) [replication] Change replication RPC to use cell blocks

[
https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694118#comment-13694118
]

Hadoop QA commented on HBASE-8737:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12589774/0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.TestCheckTestClasses

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/6147//console

This message is automatically generated.

[replication] Change replication RPC to use cell blocks
---

Key: HBASE-8737
URL: https://issues.apache.org/jira/browse/HBASE-8737
Project: HBase
Issue Type: Improvement
Components: Replication
Reporter: Chris Trezzo
Assignee: stack
Priority: Critical
Fix For: 0.98.0, 0.95.2

Attachments:
0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt

Currently, the replication rpc that ships edits simply dumps the byte value
of WAL edit key/value pairs into a protobuf message.
Modify the replication rpc mechanism to use cell blocks so it can leverage
encoding and compression.

[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.

2013-06-26 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694130#comment-13694130
 ] 

Dave Latham commented on HBASE-8806:


A little more background on how this came up.  We're currently replicating 
writes in both directions between two large clusters.  Occasionally we would 
see one node's replication queue start falling behind, and once it got behind 
it appeared to go slower than it did while it was caught up!  It would get into 
a cycle of replicating a batch of 25000 edits with each batch taking something 
like 3 minutes.  Examining threads on the node receiving the writes would show 
the handler thread in stacks like
{noformat}
IPC Server handler 68 on 60020 daemon prio=10 tid=0x2aaac0d14800 
nid=0x3548 runnable [0x4
   java.lang.Thread.State: RUNNABLE
at java.util.ArrayList.init(ArrayList.java:112)
at 
com.google.common.collect.Lists.newArrayListWithCapacity(Lists.java:168)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2129)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2059)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3571)
at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
{noformat}

The 25000 edits were being sorted by row, with many rows editing up having 
multiple puts in a batch.  Each time HRegion.doMiniBatchMutation encounters 
multiple puts to the same row it would fail to acquire the lock on that row for 
the second put, slowing it down.

This patch makes it able to handle the full batch in one go.

 Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
 duplicate rows.
 

 Key: HBASE-8806
 URL: https://issues.apache.org/jira/browse/HBASE-8806
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: rahul gidwani
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch


 If we already have the lock in the doMiniBatchMutation we don't need to 
 re-acquire it. The solution would be to keep a cache of the rowKeys already 
 locked for a miniBatchMutation and If we already have the 
 rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
 this problem would be to keep a set of rows we already locked and not try to 
 acquire the lock for these rows.  
 We have tested this fix in our production environment and has improved 
 replication performance quite a bit.  We saw a replication batch go from 3+ 
 minutes to less than 10 seconds for batches with duplicate row keys.
 {code}
 static final int ACQUIRE_LOCK_COUNT = 0;
   @Test
   public void testRedundantRowKeys() throws Exception {
 final int batchSize = 10;
 
 String tableName = getClass().getSimpleName();
 Configuration conf = HBaseConfiguration.create();
 conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
 MockHRegion region = (MockHRegion) 
 TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
 Bytes.toBytes(a));
 ListPairMutation, Integer someBatch = Lists.newArrayList();
 int i = 0;
 while (i  batchSize) {
   if (i % 2 == 0) {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), 
 null));
   } else {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), 
 null));
   }
   i++;
 }
 long startTime = System.currentTimeMillis();
 region.batchMutate(someBatch.toArray(new Pair[0]));
 long endTime = System.currentTimeMillis();
 long duration = endTime - startTime;
 System.out.println(duration:  + duration +  ms);
 assertEquals(2, ACQUIRE_LOCK_COUNT);
   }
   @Override
   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
 throws IOException {
 ACQUIRE_LOCK_COUNT++;
 return super.getLock(lockid, row, waitForLock);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library

[
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-7411:
-

Attachment: 7411v3.txt

Removed TestRecoverableZooKeeper; expects to be able to replace zk which is not
possible when curator is doing zk.

Fixed place where getZk could come back null (there may be others).

Use Netflix's Curator zookeeper library
---

Key: HBASE-7411
URL: https://issues.apache.org/jira/browse/HBASE-7411
Project: HBase
Issue Type: New Feature
Components: Zookeeper
Affects Versions: 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 0.95.2

Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch

We have mentioned using the Curator library
(https://github.com/Netflix/curator) elsewhere but we can continue the
discussion in this.
The advantages for the curator lib over ours are the recipes. We have very
similar retrying mechanism, and we don't need much of the nice client-API
layer.
We also have similar Listener interface, etc.
I think we can decide on one of the following options:
1. Do not depend on curator. We have some of the recipes, and some custom
recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991,
etc). We can also copy / fork some code from there.
2. Replace all of our zk usage / connection management to curator. We may
keep the current set of API's as a thin wrapper.
3. Use our own connection management / retry logic, and build a custom
CuratorFramework implementation for the curator recipes. This will keep the
current zk logic/code intact, and allow us to use curator-recipes as we see
fit.
4. Allow both curator and our zk layer to manage the connection. We will
still have 1 connection, but 2 abstraction layers sharing it. This is the
easiest to implement, but a freak show?
I have a patch for 4, and now prototyping 2 or 3 whichever will be less
painful.
Related issues:
HBASE-5547
HBASE-7305
HBASE-7212

[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library

[
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694143#comment-13694143
]

Hadoop QA commented on HBASE-7411:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12589780/7411v3.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified tests.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/6148//console

This message is automatically generated.

Use Netflix's Curator zookeeper library
---

Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch

[jira] [Updated] (HBASE-8800) Return non-zero exit codes when a region server aborts


 [ 
https://issues.apache.org/jira/browse/HBASE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8800:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving as committed.  Probably too radical a change for 0.94 but will let 
[~lhofhansl] make call.

 Return non-zero exit codes when a region server aborts
 --

 Key: HBASE-8800
 URL: https://issues.apache.org/jira/browse/HBASE-8800
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.95.2

 Attachments: HBASE-8800.patch


 There's a few exit code-related jiras flying around, but it seems that at 
 least for the region server we have a bigger problem: it always returns 0 
 when exiting once it's started.
 I also saw that we have a couple -1 as exit codes, AFAIK this should be 1 (or 
 at least a positive number).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.

2013-06-26 Thread rahul gidwani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694147#comment-13694147
 ] 

rahul gidwani commented on HBASE-8806:
--

I will provide a patch for trunk, no problem.  I should have it by tomorrow

 Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
 duplicate rows.
 

 Key: HBASE-8806
 URL: https://issues.apache.org/jira/browse/HBASE-8806
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: rahul gidwani
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch


 If we already have the lock in the doMiniBatchMutation we don't need to 
 re-acquire it. The solution would be to keep a cache of the rowKeys already 
 locked for a miniBatchMutation and If we already have the 
 rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
 this problem would be to keep a set of rows we already locked and not try to 
 acquire the lock for these rows.  
 We have tested this fix in our production environment and has improved 
 replication performance quite a bit.  We saw a replication batch go from 3+ 
 minutes to less than 10 seconds for batches with duplicate row keys.
 {code}
 static final int ACQUIRE_LOCK_COUNT = 0;
   @Test
   public void testRedundantRowKeys() throws Exception {
 final int batchSize = 10;
 
 String tableName = getClass().getSimpleName();
 Configuration conf = HBaseConfiguration.create();
 conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
 MockHRegion region = (MockHRegion) 
 TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
 Bytes.toBytes(a));
 ListPairMutation, Integer someBatch = Lists.newArrayList();
 int i = 0;
 while (i  batchSize) {
   if (i % 2 == 0) {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), 
 null));
   } else {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), 
 null));
   }
   i++;
 }
 long startTime = System.currentTimeMillis();
 region.batchMutate(someBatch.toArray(new Pair[0]));
 long endTime = System.currentTimeMillis();
 long duration = endTime - startTime;
 System.out.println(duration:  + duration +  ms);
 assertEquals(2, ACQUIRE_LOCK_COUNT);
   }
   @Override
   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
 throws IOException {
 ACQUIRE_LOCK_COUNT++;
 return super.getLock(lockid, row, waitForLock);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694151#comment-13694151
 ] 

Nicolas Liochon commented on HBASE-6295:


For the logs, it's a bug, easy to fix. I will do it.
For the failure itself, the integration test uses a retry count of 10. This is 
not enough. If I increase to 30 it succeeds 5 times out of 5, while I've got a 
60% failure rate with a value of 10. The integration tests runs with the value 
found in hbase-server/.../test/resources, and this value was not changed by the 
various jira we had about this default value.

I will run more tests during the night, but this seems to be it.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library

[
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-7411:
-

Attachment: 7411v4.txt

Rebase

Use Netflix's Curator zookeeper library
---

Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt,
hbase-7411_v0.patch

[jira] [Created] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter

2013-06-26 Thread Vasu Mariyala (JIRA)

Vasu Mariyala created HBASE-8809:


 Summary: Include deletes in the scan (setRaw) method does not 
respect the time range or the filter
 Key: HBASE-8809
 URL: https://issues.apache.org/jira/browse/HBASE-8809
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: Vasu Mariyala


If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) 
is executed, it still returns the delete marker at time stamp 'T'. It is 
because of the code in ScanQueryMatcher.java

  if (retainDeletesInOutput
  || (!isUserScan  (EnvironmentEdgeManager.currentTimeMillis() - 
timestamp) = timeToPurgeDeletes)
  || kv.getMemstoreTS()  maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
return MatchCode.INCLUDE;
  }

The assumption is scan (even with setRaw is set to true) should respect the 
filters and the time range specified.

Please let me know if you think this behavior can be changed so that I can 
provide a patch for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out


[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694161#comment-13694161
 ] 

stack commented on HBASE-8732:
--

How you think this happened [~eclark]? I created a table w/ fast diff and am 
able to scan it.  I then altered the table to disable FAST_DIFF and can still 
scan (even after writing).  You see this on your rig?

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694163#comment-13694163
 ] 

stack commented on HBASE-6295:
--

+1 on committing bug fix and upping retry count as addendum on this issue.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out


[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694169#comment-13694169
 ] 

Elliott Clark commented on HBASE-8732:
--

Not really sure. I see it locally actually.  When running the IT test from 
HBASE-8726 in maven or an ide.

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694178#comment-13694178
 ] 

Elliott Clark commented on HBASE-6295:
--

So I see this issue on a real cluster where the local conf is added to the 
classpath ahead of any jars.  How would the test settings be causing this ?

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out


[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694181#comment-13694181
 ] 

stack commented on HBASE-8732:
--

[~eclark] Let me try...

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time


[ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694180#comment-13694180
 ] 

Jean-Marc Spaggiari commented on HBASE-8803:


Sure!

I will add a parameter like maxthreads with a default value to 1 so with no 
parameters that will act as before, but with this parameters we will be able to 
speedup the process.

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)


 [ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HBASE-8776:
-


 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694186#comment-13694186
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

I think it might have been caused by retry tweaking (the thing we discussed 
tomorrow about the pause length). The pause is reduced to 100ms on trunk, while 
being 1000ms on 94, so current trunk retries are too short.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)


[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694189#comment-13694189
 ] 

Sergey Shelukhin commented on HBASE-8776:
-

hmm, the trunk pause is apparently reduced to 100ms, so these retry settings 
are only correct for 94. To get the same length on trunk, we'd need to put 128 
back and dial the retries up to ~35

 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)


[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694191#comment-13694191
 ] 

Sergey Shelukhin commented on HBASE-8776:
-

or I can just revert the change from trunk and 95. Any objections?

 tweak retry settings some more (on trunk and 0.94)
 --

 Key: HBASE-8776
 URL: https://issues.apache.org/jira/browse/HBASE-8776
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.8
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, 
 HBASE-8776-v1-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694194#comment-13694194
 ] 

Elliott Clark commented on HBASE-6295:
--

This started failing before the retry tweaks went in.
And you were correct yesterday trunk is still at 1000ms as the default pause 
time.  ( 
https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java#L554
 )

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Attachment: HBASE-8803-v1-trunk.patch

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.98.0, 0.94.8, 0.95.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time


 [ 
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8803:
---

Status: Patch Available  (was: Reopened)

Here we go.

Updated version:
- Added maxthreads parameter to region_mover.rb and graceful_stop.sh.
- If maxthreads is  RS then go random, else go roundrobin.

Tested on 0.94, don't have a way to test it on trunk :(

 region_mover.rb should move multiple regions at a time
 --

 Key: HBASE-8803
 URL: https://issues.apache.org/jira/browse/HBASE-8803
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.95.1, 0.94.8, 0.98.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 When there is many regions in a cluster, rolling_restart can take hours 
 because region_mover is moving the regions one by one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205
 ] 

Elliott Clark commented on HBASE-6295:
--

oh blah never mind it's just that the fall back default wasn't changed but the 
xml was.  It really is 100ms base pause time.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205
 ] 

Elliott Clark edited comment on HBASE-6295 at 6/26/13 8:01 PM:
---

oh blah never mind it's just that the fall back default wasn't changed but the 
xml was.  It really is 100ms base pause time.  I'll file a jira to make them 
all the same to stop confusion in the future.

  was (Author: eclark):
oh blah never mind it's just that the fall back default wasn't changed but 
the xml was.  It really is 100ms base pause time.
  
 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0, 0.95.2

 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library

[
https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694216#comment-13694216
]

Hadoop QA commented on HBASE-7411:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12589781/7411v4.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 9 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 lineLengths{color}. The patch introduces lines longer than
100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:

org.apache.hadoop.hbase.security.access.TestAccessController

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:475)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/6149//console

This message is automatically generated.

Use Netflix's Curator zookeeper library
---

Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt,
hbase-7411_v0.patch

[jira] [Created] (HBASE-8810) Bring in code constants in line with default xml's

Elliott Clark created HBASE-8810:


 Summary: Bring in code constants in line with default xml's
 Key: HBASE-8810
 URL: https://issues.apache.org/jira/browse/HBASE-8810
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark


After the defaults were changed in the xml some constants were left the same.

DEFAULT_HBASE_CLIENT_PAUSE for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8774) Add BatchSize and Filter to Thrift2

2013-06-26 Thread Hamed Madani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694229#comment-13694229
 ] 

Hamed Madani commented on HBASE-8774:
-

Yes Lars. The reason I followed HBASE-4176 was , it made it easier for us to 
move from thrift1 to thrift2. Also it has been tested before with folks running 
thrift1. Moreover, from what I understand from HBASE-6073 it seems 

{code}
 * Specify boolean operator for TFilterList:
 *  - MUST_PASS_ALL means AND boolean operation
 *  - MUST_PASS_ONE means OR boolean operation
 */
enum TFilterListOperator {
  MUST_PASS_ALL = 0,
  MUST_PASS_ONE = 1
}

/**
* Represents a server side filter list
*
*/
struct TFilterList {
+  1: required TFilterListOperator operator,
  2: required listTFilter filters
}
{code}

limits you to either *AND* all filters or *OR* all of them. Whereas with 
HBASE-4176 you can have something like “(Filter1 AND Filter2) OR Filter3
There is also HBASE-6073 close the scanner when the scanner doesn't have any 
more results to return. I can certainly merge the two to include this feature. 
Let me know what you think.

 Add BatchSize and Filter to Thrift2
 ---

 Key: HBASE-8774
 URL: https://issues.apache.org/jira/browse/HBASE-8774
 Project: HBase
  Issue Type: New Feature
  Components: Thrift
Affects Versions: 0.95.1
Reporter: Hamed Madani
Assignee: Hamed Madani
 Attachments: HBASE_8774.patch


 Attached Patch will add BatchSize and Filter support to Thrift2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time