[jira] [Commented] (HBASE-8789) Add max RPC version to meta-region-server zk node.
[ https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693688#comment-13693688 ] Elliott Clark commented on HBASE-8789: -- It's still 0. I think the 2 you're seeing is the field index in protobuf's. I fill out the rpcVersion with RPC_CURRENT_VERSION which seems to still be 0. Add max RPC version to meta-region-server zk node. -- Key: HBASE-8789 URL: https://issues.apache.org/jira/browse/HBASE-8789 Project: HBase Issue Type: Bug Components: IPC/RPC, Zookeeper Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch For clients to boot strap themselves they need to know the max rpc version that the meta server will accept. We should add that to the zookeeper node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693798#comment-13693798 ] Varun Sharma commented on HBASE-8370: - Here are some stats for this JIRA - I am arguing that the BlockCacheHit ratio number reported on a region server does not mean much. tbl.feeds.cf.home.bt.Index.fsBlockReadCnt : 46864, tbl.feeds.cf.home.bt.Index.fsBlockReadCacheHitCnt : 46864 Index Block cache hit ratio = 100 % tbl.feeds.cf.home.bt.Data.fsBlockReadCacheHitCnt : 202 tbl.feeds.cf.home.bt.Data.fsBlockReadCnt : 247 Data Block cache hit ratio = 82 % Overall Cache hit ration = (46864 + 202) / (46864 + 247) = 99 % Since Indexes are hit often, cache hits are 100 % and also # of hits is high. The real number that we are concerned about, is 82 % which is hit rate on the data block. However, we continue to show the # 99 % on the region server console instead. I think we need to fix that number. Please let me know if folks object to this ? Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693838#comment-13693838 ] Chao Shi commented on HBASE-8802: - It seems like TestAccessController failure has nothing to do with this patch. It still fails when I ran it on my box without this patch. totalCompactingKVs overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Priority: Trivial Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8496) Implement tags and the internals of how a tag should look like
[ https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-8496: -- Attachment: Tag design.pdf Attaching a simple design document that says how tags will be supported by HBase and the advantage of using KeyValuecodec. It also touches on the way how tags can be implemented in an optional way when we don't go with KeyValuecodec. Pls feel free to share your comments/reviews. Thanks to Andy and Anoop for their reviews/suggestions. Implement tags and the internals of how a tag should look like -- Key: HBASE-8496 URL: https://issues.apache.org/jira/browse/HBASE-8496 Project: HBase Issue Type: New Feature Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: Tag design.pdf The intent of this JIRA comes from HBASE-7897. This would help us to decide on the structure and format of how the tags should look like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0
[ https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693955#comment-13693955 ] Jean-Marc Spaggiari commented on HBASE-8793: Hi [~stack], Thanks for that! However, I tried to close 8803 or to modify one comment, or change the status, but I'm still not able. I can modify the few attributs, but not the resolution nor the status. Same in this defect, not able to modify those fields. Regionserver ubuntu's startup script return code always 0 - Key: HBASE-8793 URL: https://issues.apache.org/jira/browse/HBASE-8793 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.6 Environment: Description:Ubuntu 12.04.2 LTS Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0 Reporter: Michael Czerwiński Assignee: Jean-Marc Spaggiari Priority: Minor hbase-regionserver startup script always returns 0 (exit 0 at the end of the script) this is wrong behaviour which causes issues when trying to recognise true status of the service. Replacing it with 'exit $?' seems to fix the problem, looking at hbase master return codes are assigned to RETVAL variable which is used with exit. Not sure if the problem exist in other versions. /etc/init.d/hbase-regionserver.orig status hbase-regionserver is not running. echo $? After fix: /etc/init.d/hbase-regionserver status hbase-regionserver is not running. echo $? 1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8716) Fixups/Improvements for graceful_stop.sh/region_mover.rb
[ https://issues.apache.org/jira/browse/HBASE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693965#comment-13693965 ] Jean-Marc Spaggiari commented on HBASE-8716: I did some changes (//) which are making it 5 times faster. Might be even faster on big clusters. I will push the patch today for comments. Fixups/Improvements for graceful_stop.sh/region_mover.rb Key: HBASE-8716 URL: https://issues.apache.org/jira/browse/HBASE-8716 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.2 Attachments: 8716.txt It is a while since these scripts were touched. Giving them a spring cleaning and seeing if can make them return error codes on failure (seems like style previous was that the operator would watch the output and react to it but I see cases where tools want to call these scripts and they want return code to indicate whether the rolling upgrade worked or not). Also, see if can make the rolling restart faster since one-by-one while minimally disruptive and 'safe', it is slow one clusters of hundreds of nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0
[ https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694006#comment-13694006 ] stack commented on HBASE-8793: -- Yet... Regionserver ubuntu's startup script return code always 0 - Key: HBASE-8793 URL: https://issues.apache.org/jira/browse/HBASE-8793 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.6 Environment: Description:Ubuntu 12.04.2 LTS Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0 Reporter: Michael Czerwiński Assignee: Jean-Marc Spaggiari Priority: Minor hbase-regionserver startup script always returns 0 (exit 0 at the end of the script) this is wrong behaviour which causes issues when trying to recognise true status of the service. Replacing it with 'exit $?' seems to fix the problem, looking at hbase master return codes are assigned to RETVAL variable which is used with exit. Not sure if the problem exist in other versions. /etc/init.d/hbase-regionserver.orig status hbase-regionserver is not running. echo $? After fix: /etc/init.d/hbase-regionserver status hbase-regionserver is not running. echo $? 1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0
[ https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694005#comment-13694005 ] stack commented on HBASE-8793: -- [~jmspaggi] Try again (smile); I added you to the 'committer's list though you ain't.. Regionserver ubuntu's startup script return code always 0 - Key: HBASE-8793 URL: https://issues.apache.org/jira/browse/HBASE-8793 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.6 Environment: Description:Ubuntu 12.04.2 LTS Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0 Reporter: Michael Czerwiński Assignee: Jean-Marc Spaggiari Priority: Minor hbase-regionserver startup script always returns 0 (exit 0 at the end of the script) this is wrong behaviour which causes issues when trying to recognise true status of the service. Replacing it with 'exit $?' seems to fix the problem, looking at hbase master return codes are assigned to RETVAL variable which is used with exit. Not sure if the problem exist in other versions. /etc/init.d/hbase-regionserver.orig status hbase-regionserver is not running. echo $? After fix: /etc/init.d/hbase-regionserver status hbase-regionserver is not running. echo $? 1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694007#comment-13694007 ] stack commented on HBASE-8370: -- [~eclark] Mighty Elliott... input? Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8789) Add max RPC version to meta-region-server zk node.
[ https://issues.apache.org/jira/browse/HBASE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694008#comment-13694008 ] stack commented on HBASE-8789: -- Duh. Yes. +1 on patch. Add max RPC version to meta-region-server zk node. -- Key: HBASE-8789 URL: https://issues.apache.org/jira/browse/HBASE-8789 Project: HBase Issue Type: Bug Components: IPC/RPC, Zookeeper Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8789-0.patch, HBASE-8789-1.patch For clients to boot strap themselves they need to know the max rpc version that the meta server will accept. We should add that to the zookeeper node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Affects Version/s: 0.98.0 0.94.8 0.95.1 region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8793) Regionserver ubuntu's startup script return code always 0
[ https://issues.apache.org/jira/browse/HBASE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694015#comment-13694015 ] Jean-Marc Spaggiari commented on HBASE-8793: ;) Thanks for the yet ;) I just retried and sent you an email about the result. Regionserver ubuntu's startup script return code always 0 - Key: HBASE-8793 URL: https://issues.apache.org/jira/browse/HBASE-8793 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.6 Environment: Description:Ubuntu 12.04.2 LTS Hbase: 0.94.6+96-1.cdh4.3.0.p0.13~precise-cdh4.3.0 Reporter: Michael Czerwiński Assignee: Jean-Marc Spaggiari Priority: Minor hbase-regionserver startup script always returns 0 (exit 0 at the end of the script) this is wrong behaviour which causes issues when trying to recognise true status of the service. Replacing it with 'exit $?' seems to fix the problem, looking at hbase master return codes are assigned to RETVAL variable which is used with exit. Not sure if the problem exist in other versions. /etc/init.d/hbase-regionserver.orig status hbase-regionserver is not running. echo $? After fix: /etc/init.d/hbase-regionserver status hbase-regionserver is not running. echo $? 1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore
[ https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694028#comment-13694028 ] Amitanand Aiyer commented on HBASE-8228: Looks like this is caused when we are using multiple memstore-flusher threads; and two flush requests have a log-roll between them. to flush a region, we grab the HRegion.updatesLock.writeLock, and then try to grab the HLog.cacheFlushLock.readLock(). Most of the operation that happens within the lock, is done in memory, so this should have been a short duration. ... unless, we are waiting to grab the lock. HLog.rollWriter tries to grab the HLog.cacheFlushLock.writeLock(). This means that a Log-roll cannot happen when a flush is already in progress. If a second flush were to be initiated, when there is already a flush going on, and there is a log-roll, waiting (for a writer's lock); then the second flush, is able to get the HRegion.updatesLock.writeLock (presumably, for a different region). But, will stall on the HLog.cacheFlushLock.readLock(). This is because the ReaderWriterLock implementation, which uses the NonFairSync() will cause the reader locks to wait on the writer's request; if the writer is at the head of the queue. This interleaving results in the second flush request, holding the HRegion.updatesLock.writeLock() for as long as the first thread took to flush a region + do a log roll. Swapping the order of the HRegion.updatesLock.writeLock(), and startCacheFlush should probably fix this issue. Reducing the # of memstore flusher threads to = 1 can also stop this behavior. Investigate time taken to snapshot memstore --- Key: HBASE-8228 URL: https://issues.apache.org/jira/browse/HBASE-8228 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Snapshotting memstores is normally quick. But, sometimes it seems to take long. This JIRA is to track the investigation and fix to improve the outliers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Attachment: HBASE-8803-v0-trunk.patch region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694032#comment-13694032 ] Andrew Purtell commented on HBASE-8370: --- bq. I think we need to fix that number. +1 Configuration settings can change how we handle the different block types on a per type basis.That's only half the story if we do not have per-block-type metrics too. Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Status: Patch Available (was: Open) So here is what I did. First, when regions are unloaded from the server to be moved to other servers. Instead of doing that region by region and randomly, it's now doing that in round dobin mode, assigning one region per RS. So if there is 20 RS in the cluster, one beeing unloaded, it will move the regions 19 by 19! Then to restore the regions, instead of doing that one by one, it's now going that 10 by 10. As a result, the rolling-restart now takes 16 minutes in my cluster instead of 74 minutes. And the bigger the cluster is, the faster it will be. This version is for review only. Open to comments. I have tested it on 0.94, but I don't have a cluster running with Trunk, so I'm not able to test it... region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.95.1, 0.94.8, 0.98.0 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-8370: -- Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-8370. -- Resolution: Fixed Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari reopened HBASE-8803: region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Resolution: Fixed Status: Resolved (was: Patch Available) region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8803: - Component/s: Usability region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694044#comment-13694044 ] stack commented on HBASE-8803: -- Can you make this new behavior optional? Or at least how many to do at a time a parameter (max concurrent threads moving and restoring)? Folks want the fast rolling restart for sure -- I know for a fact that a few of your customers need it (smile) -- but the good thing about the old behavior is that it was minimally disruptive (though slow) so is good for a cluster that is doing hard serving. Thanks for working on this important operational issue [~jmspaggi] region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore
[ https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694064#comment-13694064 ] Himanshu Vashishtha commented on HBASE-8228: bq. Swapping the order of the HRegion.updatesLock.writeLock(), and startCacheFlush should probably fix this issue. So, you don't lock the region until you get the cacheFlushLock.readLock(). In 0.94, cacheFlushLock is still a ReentrantLock. I wonder whether multiple memstore flush threads help there at all (if multiple flushers are there in 0.94). In trunk, we no longer write flush events to hlog. Basically, a flush can happen while log rolling is going on. Investigate time taken to snapshot memstore --- Key: HBASE-8228 URL: https://issues.apache.org/jira/browse/HBASE-8228 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Snapshotting memstores is normally quick. But, sometimes it seems to take long. This JIRA is to track the investigation and fix to improve the outliers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694070#comment-13694070 ] Hadoop QA commented on HBASE-8803: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589763/HBASE-8803-v0-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6146//console This message is automatically generated. region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver
[ https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694087#comment-13694087 ] ramkrishna.s.vasudevan commented on HBASE-8790: --- +1 on patch. NullPointerException thrown when stopping regionserver -- Key: HBASE-8790 URL: https://issues.apache.org/jira/browse/HBASE-8790 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.95.1 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3 Reporter: Xiong LIU Assignee: Liang Xie Attachments: HBase-8790.txt The Hbase cluster is a fresh start with one regionserver. When we stop hbase, an unhandled NullPointerException is throwed in the regionserver. The regionserver's log is as follows: 2013-06-21 10:21:11,284 INFO [regionserver61020] regionserver.HRegionServer: Closing user regions 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: Waiting on 1028785192 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: ABORTING region server HOSTNAME_TEST,61020,1371781086817 : Unhandled: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache .hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-06-21 10:21:14,293 INFO [regionserver61020] regionserver.HRegionServer: STOPPED: Unhandled: null 2013-06-21 10:21:14,293 INFO [regionserver61020] ipc.RpcServer: Stopping server on 61020 It seems that after closing user regions, the rssStub is null. update: we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) in hbase-site.xml, the regionserver is continuously attempting connect to master. and if we stop hbase, the above NullPointerException occurred. With hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks
[ https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8737: - Attachment: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch Here is a patch that does both sides. Trying against hadoopqa. [replication] Change replication RPC to use cell blocks --- Key: HBASE-8737 URL: https://issues.apache.org/jira/browse/HBASE-8737 Project: HBase Issue Type: Improvement Components: Replication Reporter: Chris Trezzo Assignee: stack Priority: Critical Fix For: 0.95.2 Attachments: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt Currently, the replication rpc that ships edits simply dumps the byte value of WAL edit key/value pairs into a protobuf message. Modify the replication rpc mechanism to use cell blocks so it can leverage encoding and compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8737) [replication] Change replication RPC to use cell blocks
[ https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8737: - Fix Version/s: 0.98.0 Status: Patch Available (was: Open) [replication] Change replication RPC to use cell blocks --- Key: HBASE-8737 URL: https://issues.apache.org/jira/browse/HBASE-8737 Project: HBase Issue Type: Improvement Components: Replication Reporter: Chris Trezzo Assignee: stack Priority: Critical Fix For: 0.98.0, 0.95.2 Attachments: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt Currently, the replication rpc that ships edits simply dumps the byte value of WAL edit key/value pairs into a protobuf message. Modify the replication rpc mechanism to use cell blocks so it can leverage encoding and compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8737) [replication] Change replication RPC to use cell blocks
[ https://issues.apache.org/jira/browse/HBASE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694118#comment-13694118 ] Hadoop QA commented on HBASE-8737: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589774/0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestCheckTestClasses Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6147//console This message is automatically generated. [replication] Change replication RPC to use cell blocks --- Key: HBASE-8737 URL: https://issues.apache.org/jira/browse/HBASE-8737 Project: HBase Issue Type: Improvement Components: Replication Reporter: Chris Trezzo Assignee: stack Priority: Critical Fix For: 0.98.0, 0.95.2 Attachments: 0001-HBASE-8737-replication-Change-replication-RPC-to-use.patch, 8737.txt Currently, the replication rpc that ships edits simply dumps the byte value of WAL edit key/value pairs into a protobuf message. Modify the replication rpc mechanism to use cell blocks so it can leverage encoding and compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
[ https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694130#comment-13694130 ] Dave Latham commented on HBASE-8806: A little more background on how this came up. We're currently replicating writes in both directions between two large clusters. Occasionally we would see one node's replication queue start falling behind, and once it got behind it appeared to go slower than it did while it was caught up! It would get into a cycle of replicating a batch of 25000 edits with each batch taking something like 3 minutes. Examining threads on the node receiving the writes would show the handler thread in stacks like {noformat} IPC Server handler 68 on 60020 daemon prio=10 tid=0x2aaac0d14800 nid=0x3548 runnable [0x4 java.lang.Thread.State: RUNNABLE at java.util.ArrayList.init(ArrayList.java:112) at com.google.common.collect.Lists.newArrayListWithCapacity(Lists.java:168) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2129) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2059) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3571) at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) {noformat} The 25000 edits were being sorted by row, with many rows editing up having multiple puts in a batch. Each time HRegion.doMiniBatchMutation encounters multiple puts to the same row it would fail to acquire the lock on that row for the second put, slowing it down. This patch makes it able to handle the full batch in one go. Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows. Key: HBASE-8806 URL: https://issues.apache.org/jira/browse/HBASE-8806 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: rahul gidwani Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch If we already have the lock in the doMiniBatchMutation we don't need to re-acquire it. The solution would be to keep a cache of the rowKeys already locked for a miniBatchMutation and If we already have the rowKey in the cache, we don't repeatedly try and acquire the lock. A fix to this problem would be to keep a set of rows we already locked and not try to acquire the lock for these rows. We have tested this fix in our production environment and has improved replication performance quite a bit. We saw a replication batch go from 3+ minutes to less than 10 seconds for batches with duplicate row keys. {code} static final int ACQUIRE_LOCK_COUNT = 0; @Test public void testRedundantRowKeys() throws Exception { final int batchSize = 10; String tableName = getClass().getSimpleName(); Configuration conf = HBaseConfiguration.create(); conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class); MockHRegion region = (MockHRegion) TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, Bytes.toBytes(a)); ListPairMutation, Integer someBatch = Lists.newArrayList(); int i = 0; while (i batchSize) { if (i % 2 == 0) { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), null)); } else { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), null)); } i++; } long startTime = System.currentTimeMillis(); region.batchMutate(someBatch.toArray(new Pair[0])); long endTime = System.currentTimeMillis(); long duration = endTime - startTime; System.out.println(duration: + duration + ms); assertEquals(2, ACQUIRE_LOCK_COUNT); } @Override public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) throws IOException { ACQUIRE_LOCK_COUNT++; return super.getLock(lockid, row, waitForLock); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7411: - Attachment: 7411v3.txt Removed TestRecoverableZooKeeper; expects to be able to replace zk which is not possible when curator is doing zk. Fixed place where getZk could come back null (there may be others). Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2 or 3 whichever will be less painful. Related issues: HBASE-5547 HBASE-7305 HBASE-7212 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694143#comment-13694143 ] Hadoop QA commented on HBASE-7411: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589780/7411v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6148//console This message is automatically generated. Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2 or 3 whichever will be less painful. Related issues: HBASE-5547 HBASE-7305 HBASE-7212 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8800) Return non-zero exit codes when a region server aborts
[ https://issues.apache.org/jira/browse/HBASE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8800: - Resolution: Fixed Status: Resolved (was: Patch Available) Resolving as committed. Probably too radical a change for 0.94 but will let [~lhofhansl] make call. Return non-zero exit codes when a region server aborts -- Key: HBASE-8800 URL: https://issues.apache.org/jira/browse/HBASE-8800 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.95.2 Attachments: HBASE-8800.patch There's a few exit code-related jiras flying around, but it seems that at least for the region server we have a bigger problem: it always returns 0 when exiting once it's started. I also saw that we have a couple -1 as exit codes, AFAIK this should be 1 (or at least a positive number). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
[ https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694147#comment-13694147 ] rahul gidwani commented on HBASE-8806: -- I will provide a patch for trunk, no problem. I should have it by tomorrow Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows. Key: HBASE-8806 URL: https://issues.apache.org/jira/browse/HBASE-8806 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: rahul gidwani Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch If we already have the lock in the doMiniBatchMutation we don't need to re-acquire it. The solution would be to keep a cache of the rowKeys already locked for a miniBatchMutation and If we already have the rowKey in the cache, we don't repeatedly try and acquire the lock. A fix to this problem would be to keep a set of rows we already locked and not try to acquire the lock for these rows. We have tested this fix in our production environment and has improved replication performance quite a bit. We saw a replication batch go from 3+ minutes to less than 10 seconds for batches with duplicate row keys. {code} static final int ACQUIRE_LOCK_COUNT = 0; @Test public void testRedundantRowKeys() throws Exception { final int batchSize = 10; String tableName = getClass().getSimpleName(); Configuration conf = HBaseConfiguration.create(); conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class); MockHRegion region = (MockHRegion) TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, Bytes.toBytes(a)); ListPairMutation, Integer someBatch = Lists.newArrayList(); int i = 0; while (i batchSize) { if (i % 2 == 0) { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), null)); } else { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), null)); } i++; } long startTime = System.currentTimeMillis(); region.batchMutate(someBatch.toArray(new Pair[0])); long endTime = System.currentTimeMillis(); long duration = endTime - startTime; System.out.println(duration: + duration + ms); assertEquals(2, ACQUIRE_LOCK_COUNT); } @Override public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) throws IOException { ACQUIRE_LOCK_COUNT++; return super.getLock(lockid, row, waitForLock); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694151#comment-13694151 ] Nicolas Liochon commented on HBASE-6295: For the logs, it's a bug, easy to fix. I will do it. For the failure itself, the integration test uses a retry count of 10. This is not enough. If I increase to 30 it succeeds 5 times out of 5, while I've got a 60% failure rate with a value of 10. The integration tests runs with the value found in hbase-server/.../test/resources, and this value was not changed by the various jira we had about this default value. I will run more tests during the night, but this seems to be it. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7411: - Attachment: 7411v4.txt Rebase Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2 or 3 whichever will be less painful. Related issues: HBASE-5547 HBASE-7305 HBASE-7212 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter
Vasu Mariyala created HBASE-8809: Summary: Include deletes in the scan (setRaw) method does not respect the time range or the filter Key: HBASE-8809 URL: https://issues.apache.org/jira/browse/HBASE-8809 Project: HBase Issue Type: Bug Components: Scanners Reporter: Vasu Mariyala If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java if (retainDeletesInOutput || (!isUserScan (EnvironmentEdgeManager.currentTimeMillis() - timestamp) = timeToPurgeDeletes) || kv.getMemstoreTS() maxReadPointToTrackVersions) { // always include or it is not time yet to check whether it is OK // to purge deltes or not return MatchCode.INCLUDE; } The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified. Please let me know if you think this behavior can be changed so that I can provide a patch for it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694161#comment-13694161 ] stack commented on HBASE-8732: -- How you think this happened [~eclark]? I created a table w/ fast diff and am able to scan it. I then altered the table to disable FAST_DIFF and can still scan (even after writing). You see this on your rig? Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Priority: Critical Fix For: 0.95.2 Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694163#comment-13694163 ] stack commented on HBASE-6295: -- +1 on committing bug fix and upping retry count as addendum on this issue. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694169#comment-13694169 ] Elliott Clark commented on HBASE-8732: -- Not really sure. I see it locally actually. When running the IT test from HBASE-8726 in maven or an ide. Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Priority: Critical Fix For: 0.95.2 Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694178#comment-13694178 ] Elliott Clark commented on HBASE-6295: -- So I see this issue on a real cluster where the local conf is added to the classpath ahead of any jars. How would the test settings be causing this ? Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694181#comment-13694181 ] stack commented on HBASE-8732: -- [~eclark] Let me try... Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Priority: Critical Fix For: 0.95.2 Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694180#comment-13694180 ] Jean-Marc Spaggiari commented on HBASE-8803: Sure! I will add a parameter like maxthreads with a default value to 1 so with no parameters that will act as before, but with this parameters we will be able to speedup the process. region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)
[ https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reopened HBASE-8776: - tweak retry settings some more (on trunk and 0.94) -- Key: HBASE-8776 URL: https://issues.apache.org/jira/browse/HBASE-8776 Project: HBase Issue Type: Bug Affects Versions: 0.94.8 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, HBASE-8776-v1-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694186#comment-13694186 ] Sergey Shelukhin commented on HBASE-6295: - I think it might have been caused by retry tweaking (the thing we discussed tomorrow about the pause length). The pause is reduced to 100ms on trunk, while being 1000ms on 94, so current trunk retries are too short. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)
[ https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694189#comment-13694189 ] Sergey Shelukhin commented on HBASE-8776: - hmm, the trunk pause is apparently reduced to 100ms, so these retry settings are only correct for 94. To get the same length on trunk, we'd need to put 128 back and dial the retries up to ~35 tweak retry settings some more (on trunk and 0.94) -- Key: HBASE-8776 URL: https://issues.apache.org/jira/browse/HBASE-8776 Project: HBase Issue Type: Bug Affects Versions: 0.94.8 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, HBASE-8776-v1-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)
[ https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694191#comment-13694191 ] Sergey Shelukhin commented on HBASE-8776: - or I can just revert the change from trunk and 95. Any objections? tweak retry settings some more (on trunk and 0.94) -- Key: HBASE-8776 URL: https://issues.apache.org/jira/browse/HBASE-8776 Project: HBase Issue Type: Bug Affects Versions: 0.94.8 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, HBASE-8776-v1-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694194#comment-13694194 ] Elliott Clark commented on HBASE-6295: -- This started failing before the retry tweaks went in. And you were correct yesterday trunk is still at 1000ms as the default pause time. ( https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java#L554 ) Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Attachment: HBASE-8803-v1-trunk.patch region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-8803: --- Status: Patch Available (was: Reopened) Here we go. Updated version: - Added maxthreads parameter to region_mover.rb and graceful_stop.sh. - If maxthreads is RS then go random, else go roundrobin. Tested on 0.94, don't have a way to test it on trunk :( region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.95.1, 0.94.8, 0.98.0 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205 ] Elliott Clark commented on HBASE-6295: -- oh blah never mind it's just that the fall back default wasn't changed but the xml was. It really is 100ms base pause time. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694205#comment-13694205 ] Elliott Clark edited comment on HBASE-6295 at 6/26/13 8:01 PM: --- oh blah never mind it's just that the fall back default wasn't changed but the xml was. It really is 100ms base pause time. I'll file a jira to make them all the same to stop confusion in the future. was (Author: eclark): oh blah never mind it's just that the fall back default wasn't changed but the xml was. It really is 100ms base pause time. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0, 0.95.2 Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694216#comment-13694216 ] Hadoop QA commented on HBASE-7411: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589781/7411v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:475) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6149//console This message is automatically generated. Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2 or 3 whichever will be less painful. Related issues:
[jira] [Created] (HBASE-8810) Bring in code constants in line with default xml's
Elliott Clark created HBASE-8810: Summary: Bring in code constants in line with default xml's Key: HBASE-8810 URL: https://issues.apache.org/jira/browse/HBASE-8810 Project: HBase Issue Type: Bug Reporter: Elliott Clark After the defaults were changed in the xml some constants were left the same. DEFAULT_HBASE_CLIENT_PAUSE for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8774) Add BatchSize and Filter to Thrift2
[ https://issues.apache.org/jira/browse/HBASE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694229#comment-13694229 ] Hamed Madani commented on HBASE-8774: - Yes Lars. The reason I followed HBASE-4176 was , it made it easier for us to move from thrift1 to thrift2. Also it has been tested before with folks running thrift1. Moreover, from what I understand from HBASE-6073 it seems {code} * Specify boolean operator for TFilterList: * - MUST_PASS_ALL means AND boolean operation * - MUST_PASS_ONE means OR boolean operation */ enum TFilterListOperator { MUST_PASS_ALL = 0, MUST_PASS_ONE = 1 } /** * Represents a server side filter list * */ struct TFilterList { + 1: required TFilterListOperator operator, 2: required listTFilter filters } {code} limits you to either *AND* all filters or *OR* all of them. Whereas with HBASE-4176 you can have something like “(Filter1 AND Filter2) OR Filter3 There is also HBASE-6073 close the scanner when the scanner doesn't have any more results to return. I can certainly merge the two to include this feature. Let me know what you think. Add BatchSize and Filter to Thrift2 --- Key: HBASE-8774 URL: https://issues.apache.org/jira/browse/HBASE-8774 Project: HBase Issue Type: New Feature Components: Thrift Affects Versions: 0.95.1 Reporter: Hamed Madani Assignee: Hamed Madani Attachments: HBASE_8774.patch Attached Patch will add BatchSize and Filter support to Thrift2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694244#comment-13694244 ] Hadoop QA commented on HBASE-8803: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589789/HBASE-8803-v1-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController org.apache.hadoop.hbase.TestIOFencing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6150//console This message is automatically generated. region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8776) tweak retry settings some more (on trunk and 0.94)
[ https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694252#comment-13694252 ] stack commented on HBASE-8776: -- Do whatever you need [~sershe] to get to 5minutes or so in trunk. Thanks. tweak retry settings some more (on trunk and 0.94) -- Key: HBASE-8776 URL: https://issues.apache.org/jira/browse/HBASE-8776 Project: HBase Issue Type: Bug Affects Versions: 0.94.8 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8776-v0.patch, HBASE-8776-v1.patch, HBASE-8776-v1-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7411: - Attachment: 7411v4.txt Retry Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 7411v4.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2 or 3 whichever will be less painful. Related issues: HBASE-5547 HBASE-7305 HBASE-7212 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter
[ https://issues.apache.org/jira/browse/HBASE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-8809: --- Description: If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java {code} if (retainDeletesInOutput || (!isUserScan (EnvironmentEdgeManager.currentTimeMillis() - timestamp) = timeToPurgeDeletes) || kv.getMemstoreTS() maxReadPointToTrackVersions) { // always include or it is not time yet to check whether it is OK // to purge deltes or not return MatchCode.INCLUDE; } {code} The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified. Please let me know if you think this behavior can be changed so that I can provide a patch for it. was: If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java if (retainDeletesInOutput || (!isUserScan (EnvironmentEdgeManager.currentTimeMillis() - timestamp) = timeToPurgeDeletes) || kv.getMemstoreTS() maxReadPointToTrackVersions) { // always include or it is not time yet to check whether it is OK // to purge deltes or not return MatchCode.INCLUDE; } The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified. Please let me know if you think this behavior can be changed so that I can provide a patch for it. Include deletes in the scan (setRaw) method does not respect the time range or the filter - Key: HBASE-8809 URL: https://issues.apache.org/jira/browse/HBASE-8809 Project: HBase Issue Type: Bug Components: Scanners Reporter: Vasu Mariyala If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java {code} if (retainDeletesInOutput || (!isUserScan (EnvironmentEdgeManager.currentTimeMillis() - timestamp) = timeToPurgeDeletes) || kv.getMemstoreTS() maxReadPointToTrackVersions) { // always include or it is not time yet to check whether it is OK // to purge deltes or not return MatchCode.INCLUDE; } {code} The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified. Please let me know if you think this behavior can be changed so that I can provide a patch for it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8089) Add type support
[ https://issues.apache.org/jira/browse/HBASE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-8089: Attachment: hbase data types WIP.pdf Attaching my slides from the Hadoop Summit BoF talk per [~stack]'s suggestion. Add type support Key: HBASE-8089 URL: https://issues.apache.org/jira/browse/HBASE-8089 Project: HBase Issue Type: New Feature Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.95.2 Attachments: HBASE-8089-types.txt, HBASE-8089-types.txt, HBASE-8089-types.txt, HBASE-8089-types.txt, hbase data types WIP.pdf This proposal outlines an improvement to HBase that provides for a set of types, above and beyond the existing byte-bucket strategy. This is intended to reduce user-level duplication of effort, provide better support for 3rd-party integration, and provide an overall improved experience for developers using HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8693) Implement extensible type API based on serialization primitives
[ https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694262#comment-13694262 ] Nick Dimiduk commented on HBASE-8693: - bq. Should this work be in hbase-common rather than in hbase-client? Initial conversations required the type stuff not be in common. I agree, it makes more sense there and I think that community opinion is changing. The current implementation doesn't bring in any dependencies, so it should be painless. bq. What is Order here? {{Order}} is a component from the {{OrderedBytes}} implementation (see patch on HBASE-8201). It enables users to store data sorted in ascending or descending order. Right now it's mostly a vestigial appendage; I don't know how the data types API wants to expose and consume this functionality. I'm hoping to gain insight from Phoenix, Kiji, c in future reviews. bq. When would I use isCoercibleTo? This comes from examination of Phoenix's {{PDataType}}. My understanding is, in the absence of secondary indices, the query planner can use type coercion to its advantage. This is the part of the data type API that I understand the least. I'm hoping for more clarity from [~giacomotaylor]. bq. I see a read on Union4 Sounds like a bug to me. bq. How I describe a Struct outside of a Struct..? Examples to follow. bq. Whats a Binary? Equivalent to SQL BLOB. This is how a user can inject good old fashion {{byte[]}}s into a {{Struct}} or {{Union}}. bq. Do we need all these types? Great question. That conversation is happening up on HBASE-8089. My preference is no, but I think the SQL guys want more of these for better interoperability between them. Implement extensible type API based on serialization primitives --- Key: HBASE-8693 URL: https://issues.apache.org/jira/browse/HBASE-8693 Project: HBase Issue Type: Sub-task Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.95.2 Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8799: - Attachment: 8799.txt Add some debugging -- include toString of exception we are getting. TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8799: - Fix Version/s: 0.95.2 Assignee: stack TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694270#comment-13694270 ] stack commented on HBASE-8799: -- Committed 8799.txt TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8799: - Status: Patch Available (was: Open) Submitting patch to hadoopqa to see if it gets me a message too. TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694278#comment-13694278 ] Hadoop QA commented on HBASE-8799: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589799/8799.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6152//console This message is automatically generated. TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694296#comment-13694296 ] Varun Sharma commented on HBASE-8370: - So, it seems that we have per block type metrics from SchemaMetrics under the region server and they are exposed as /jmx. The question is, which metric should we report on the region server UI. Right now all our clusters 99 % cache hit ratio which is false, since 20 % percent of the time there is a DataBlock miss and we are hitting disk for 20 % of requests. I have been misled by this number in the past, and I think there could be others, who are being similarly misled. So, should we just report another more representative metric on the region server console. Varun Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694313#comment-13694313 ] Hudson commented on HBASE-8802: --- Integrated in hbase-0.95 #270 (See [https://builds.apache.org/job/hbase-0.95/270/]) HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497051) Result = FAILURE sershe : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java totalCompactingKVs overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Priority: Trivial Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694320#comment-13694320 ] Elliott Clark commented on HBASE-8370: -- We don't have per block type metrics in trunk/95 because the overall cache hit percentage is a good proxy for data block cache percent. Yes the overall number is higher but it still gives a good actionable number. You can know if you're doing better or worse than you were before. Even better is the derivative of cache miss count. Overall SchemaMetrics cost HBase about 10% of it's performance and I just don't think the enough people got enough out of it to keep per cf per block type metrics. Maybe we should show the percentage to more decimal figures so that it's more obvious that there are some misses? But overall while the UI is nice it's not what should be used for figuring these things out. That should be done by your metrics system (CM, Ganglia, OpenTSDB, etc). Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8228) Investigate time taken to snapshot memstore
[ https://issues.apache.org/jira/browse/HBASE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694319#comment-13694319 ] Amitanand Aiyer commented on HBASE-8228: I think open source is using one memstore flusher. Multiple memstore flush threads was added pretty recently. This was part of the efforts to reduce the time it takes to send machines to repairs etc. Investigate time taken to snapshot memstore --- Key: HBASE-8228 URL: https://issues.apache.org/jira/browse/HBASE-8228 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Snapshotting memstores is normally quick. But, sometimes it seems to take long. This JIRA is to track the investigation and fix to improve the outliers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7411) Use Netflix's Curator zookeeper library
[ https://issues.apache.org/jira/browse/HBASE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694332#comment-13694332 ] Hadoop QA commented on HBASE-7411: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589796/7411v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.security.access.TestAccessController {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:475) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6151//console This message is automatically generated. Use Netflix's Curator zookeeper library --- Key: HBASE-7411 URL: https://issues.apache.org/jira/browse/HBASE-7411 Project: HBase Issue Type: New Feature Components: Zookeeper Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.2 Attachments: 7411v2.txt, 7411v2.txt, 7411v3.txt, 7411v4.txt, 7411v4.txt, hbase-7411_v0.patch We have mentioned using the Curator library (https://github.com/Netflix/curator) elsewhere but we can continue the discussion in this. The advantages for the curator lib over ours are the recipes. We have very similar retrying mechanism, and we don't need much of the nice client-API layer. We also have similar Listener interface, etc. I think we can decide on one of the following options: 1. Do not depend on curator. We have some of the recipes, and some custom recipes (ZKAssign, Leader election, etc already working, locks in HBASE-5991, etc). We can also copy / fork some code from there. 2. Replace all of our zk usage / connection management to curator. We may keep the current set of API's as a thin wrapper. 3. Use our own connection management / retry logic, and build a custom CuratorFramework implementation for the curator recipes. This will keep the current zk logic/code intact, and allow us to use curator-recipes as we see fit. 4. Allow both curator and our zk layer to manage the connection. We will still have 1 connection, but 2 abstraction layers sharing it. This is the easiest to implement, but a freak show? I have a patch for 4, and now prototyping 2
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694336#comment-13694336 ] Varun Sharma commented on HBASE-8370: - We don't have per block type metrics in trunk/95 because the overall cache hit percentage is a good proxy for data block cache percent. Yes the overall number is higher but it still gives a good actionable number. You can know if you're doing better or worse than you were before. Even better is the derivative of cache miss count. I am not sure this is true - this number is always 99 % for us on all clusters - blockCacheHitCachingRation - how can a number which never changes, ever be actionable ? Even with decimal numbers, its never going to change because the index blocks are going to take over Also, the different b/w 82 % cache hit ratio to 99 % cache hit ratio is enormous. Controlling you p80 on latency is a *lot* easier than your p99. A cache hit ratio of 99 % just sends you this false sense of security that you have controlled your p99 latency. This is important for online serving, maynot be for enterprise. I guess, we don't need to bring back SchemaMetrics to fix this but we can have block level metrics. At least I want to be sure that Index blocks have 100 % cache hit rates because if that's not happening, then I am in a bad situation. It would be better to not have folks using HBase for online storage, play a guessing game, as to what the true effectiveness of the cache is. Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694343#comment-13694343 ] Elliott Clark commented on HBASE-8370: -- bq.this number is always 99 % for us on all clusters That's why I said we need more decimal places for it. bq.Also, the different b/w 82 % cache hit ratio to 99 % cache hit ratio is enormous. But that 82% doesn't tell you anything all by itself. For a given work load is 80% good or bad. You don't know. That percentage is really only useful if you have a base line so it's equally informative uf the cache percentage to go from 99 and then falls to 98 or if it's 84 and falls to 83. Additionally gauges are bad. They just don't tell a great story. There's a lot of lossy data there, sampling times can skew your picture of what's actually happening. See [~phobos182]'s slides (https://speakerdeck.com/phobos182/metrics-at-pinterest) on why you should prefer counters over gauges. That's why I said that derivative of cache miss count is the best way to look at cache efficacy. It gives you an accurate count of the number of times you have to go to hdfs (not really disk since there can be os cache there). It also provides a good way to compare today to yesterday. Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694357#comment-13694357 ] Hudson commented on HBASE-8802: --- Integrated in hbase-0.95-on-hadoop2 #150 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/150/]) HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497051) Result = FAILURE sershe : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java totalCompactingKVs overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Priority: Trivial Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694361#comment-13694361 ] Varun Sharma commented on HBASE-8370: - Having a cache hit ratio of 80 % means that at least 80 % of my requests are fast (assuming GC out of picture) - in the current scenario, it may map to a number like 99.9 % and tomorrow if I had 0 % cache hits for data blocks, the number comes down to 99.5 % - I am able to calculate this based on the numbers I paste above. It assumes a certain distribution b/w number of accesses to Index blocks and Data blocks. Tomorrow, if the distribution changes, it may well be that 99.5 % overall cache hit ratio corresponds to 90 % hit rate on data blocks. So, I don't think that Overall cache hit ratio is a good proxy for Data block cache hit ratio. As far as derivatives go, Miss count derivative can go up with other things like read request count - so now we would also need to do a derivate on that counter and compare etc. On 0.94, that number has been overflowing for us all the time and is -ve, is that being fixed in trunk ? I dont think this is about counters vs gauges. I am fine with exposing counters per block type. Right now, I just don't have any insight into the block cache which plays an important role in serving reads. When a compaction happens and new files are written, I dont know the number of cache misses for Index block vs Data block vs Bloom block. I would no longer know how many Data blocks are being accessed and how many Index blocks etc Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694377#comment-13694377 ] Elliott Clark commented on HBASE-8370: -- bq.Having a cache hit ratio of 80 % means that at least 80 % of my requests are fast I would disagree. * Full handlers * Giant gets of large amounts of data. * Gets without a proper bloom filter. * Things that skip past lots of (cached) blocks * Slow data block encoding * slow filters * slow network * lock contention * GC There are TONS of other reason that your requests can be slow. And without knowing the work load you can't tell if cache miss is more or less likely than any other explanation. I've seen workloads where the cache percent was in the low teens and I've seen workloads where the cache percent was really 100%. There's no way a priori to know if a number is good or bad. So you again are back to using the metrics with a base line and comparing them. For that the absolute numbers are less important. bq.As far as derivatives go, Miss count derivative can go up with other things like read request count Yep and that makes things harder but the only thing that's not susceptible are gauges. And like I said before I'm trying to move us off of gauges. bq.I dont know the number of cache misses for Index block vs Data block vs Bloom block. I would no longer know how many Data blocks are being accessed and how many Index blocks etc But those aren't actionable metrics. * If your bloom block cache hit count goes down you can do... Not much. Not worth counting if you can't take action on it. * With the way the index blocks works you can't cache miss them, after the first time, unless we're oom (they aren't ever evicted, even if you turn off caching the cf). So you'll see that there are some misses on region open, and anytime there's a new flush or compaction. So it will be 100%. Compaction and flush metrics are much more useful here for determining this kind of thing, so there's no need to add more metrics for something that's better covered somewhere else. * So data blocks are the only useful one. and they dominate the number of blocks requested. So this can pretty well be covered by the following. ** blockCacheExpressHitPercent ** blockCountHitPercent ** blockCacheHitCount ** blockCacheMissCount I'm -1 adding any more metrics on the read path unless there's something that's totally missed (Jeremy brought up a couple the last time I met with him). That code is just too important to be instrumented any more for things that can be figured out other ways (and I would argue better ways but that's less important). I'm +1 on making that cache hit percent a double so there's more accuracy. Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8496) Implement tags and the internals of how a tag should look like
[ https://issues.apache.org/jira/browse/HBASE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694379#comment-13694379 ] Ted Yu commented on HBASE-8496: --- On page 6: bq. In case of per HFile The sentence seems to be incomplete. bq. once we close the file we add the Meta data saying tagpresent = true and avg_tag_len = 0. avg_tag_len = 0 would indicate that there is no tag present. Why do we need two flags (tagpresent and avg_tag_len) ? Later compaction is mentioned where tagpresent is changed to false. But we should be able to achieve this at the time of flush, right ? {code} byte[] tagArray = kv.getTagsArray(); Tag decodeTag = KeyValueUtil.decodeTag(tagArray); {code} In the above sample, I would expect decodeTag() to return more than one Tag. Would all Tags in the KeyValue be returned to filterKeyValue() ? I think it would be better if Tag.Type.Visibility is passed to decodeTag() so that only visibility Tag is returned. Implement tags and the internals of how a tag should look like -- Key: HBASE-8496 URL: https://issues.apache.org/jira/browse/HBASE-8496 Project: HBase Issue Type: New Feature Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: Tag design.pdf The intent of this JIRA comes from HBASE-7897. This would help us to decide on the structure and format of how the tags should look like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver
[ https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694384#comment-13694384 ] Ted Yu commented on HBASE-8790: --- Integrated to 0.95 and trunk. Thanks for the patch, Liang. Thanks for the review, Ram. NullPointerException thrown when stopping regionserver -- Key: HBASE-8790 URL: https://issues.apache.org/jira/browse/HBASE-8790 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.95.1 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3 Reporter: Xiong LIU Assignee: Liang Xie Attachments: HBase-8790.txt The Hbase cluster is a fresh start with one regionserver. When we stop hbase, an unhandled NullPointerException is throwed in the regionserver. The regionserver's log is as follows: 2013-06-21 10:21:11,284 INFO [regionserver61020] regionserver.HRegionServer: Closing user regions 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: Waiting on 1028785192 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: ABORTING region server HOSTNAME_TEST,61020,1371781086817 : Unhandled: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache .hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-06-21 10:21:14,293 INFO [regionserver61020] regionserver.HRegionServer: STOPPED: Unhandled: null 2013-06-21 10:21:14,293 INFO [regionserver61020] ipc.RpcServer: Stopping server on 61020 It seems that after closing user regions, the rssStub is null. update: we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) in hbase-site.xml, the regionserver is continuously attempting connect to master. and if we stop hbase, the above NullPointerException occurred. With hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8790) NullPointerException thrown when stopping regionserver
[ https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8790: -- Fix Version/s: 0.95.2 0.98.0 Hadoop Flags: Reviewed NullPointerException thrown when stopping regionserver -- Key: HBASE-8790 URL: https://issues.apache.org/jira/browse/HBASE-8790 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.95.1 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3 Reporter: Xiong LIU Assignee: Liang Xie Fix For: 0.98.0, 0.95.2 Attachments: HBase-8790.txt The Hbase cluster is a fresh start with one regionserver. When we stop hbase, an unhandled NullPointerException is throwed in the regionserver. The regionserver's log is as follows: 2013-06-21 10:21:11,284 INFO [regionserver61020] regionserver.HRegionServer: Closing user regions 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: Waiting on 1028785192 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: ABORTING region server HOSTNAME_TEST,61020,1371781086817 : Unhandled: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache .hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-06-21 10:21:14,293 INFO [regionserver61020] regionserver.HRegionServer: STOPPED: Unhandled: null 2013-06-21 10:21:14,293 INFO [regionserver61020] ipc.RpcServer: Stopping server on 61020 It seems that after closing user regions, the rssStub is null. update: we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) in hbase-site.xml, the regionserver is continuously attempting connect to master. and if we stop hbase, the above NullPointerException occurred. With hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694389#comment-13694389 ] Varun Sharma commented on HBASE-8370: - We can make the hit percent a double. But if we never evict index blocks, one option is to only count DataBlocks for HitPercent, CacheHitCount, CacheMissCount. I know that is not the case for 0.94. Is that the case for trunk or can we change these metrics to only instrument data blocks then ? Anyone else have opinions ? Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8802) totalCompactingKVs overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694393#comment-13694393 ] Hudson commented on HBASE-8802: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #585 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/585/]) HBASE-8802 totalCompactingKVs overflow (Chao Shi) (Revision 1497050) Result = FAILURE sershe : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java totalCompactingKVs overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Priority: Trivial Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8799) TestAccessController#testBulkLoad has been failing for some time on trunk/0.95
[ https://issues.apache.org/jira/browse/HBASE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694394#comment-13694394 ] Hudson commented on HBASE-8799: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #585 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/585/]) HBASE-8799 TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- ADDING DEBUG (Revision 1497123) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java TestAccessController#testBulkLoad has been failing for some time on trunk/0.95 -- Key: HBASE-8799 URL: https://issues.apache.org/jira/browse/HBASE-8799 Project: HBase Issue Type: Bug Components: Coprocessors, security, test Affects Versions: 0.98.0, 0.95.2 Reporter: Andrew Purtell Assignee: stack Fix For: 0.95.2 Attachments: 8799.txt I've observed this in Jenkins reports and also while I was working on HBASE-8692, only on trunk/0.95, not on 0.94: {quote} Failed tests: testBulkLoad(org.apache.hadoop.hbase.security.access.TestAccessController): Expected action to pass for user 'rwuser' but was denied {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8410) Basic quota support for namespaces
[ https://issues.apache.org/jira/browse/HBASE-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694401#comment-13694401 ] Ted Yu commented on HBASE-8410: --- For NamespaceController, please add class javadoc and audience annotation. {code} +zkManager = new ZKNamespaceManager(zk); +zkManager.start(); {code} I saw the following in TableNamespaceManager: {code} +zkNamespaceManager = new ZKNamespaceManager(masterServices.getZooKeeper()); +zkNamespaceManager.start(); {code} So we may have more than one ZKNamespaceManager running in the same JVM ? The spacing is slightly off: two spaces should be used per indentation. {code} +maxRegions = Long.parseLong(value); + } catch (NumberFormatException exp) { +throw new ConstraintException(NumberFormatException while getting max regions., exp); {code} Please include value in exception message. {code} +currentStatus = getNamespaceQuota(ctx.getEnvironment().getConfiguration(), + nspdesc.getName()); {code} I think there should be a better name for getNamespaceQuota because quota should be the setting governing the namespace which is different from the current status for the underlying namespace. I wonder if using table and region count is a good way for enforcing quota because the underlying region size can vary. {code} ++ is not allowed to have + regions.length ++ number of regions. The total number of regions permitted are only {code} Remove 'number of '. 'permitted are only' - 'permitted is only'. For class NamespaceQuota, please add javadoc and audience annotation. I think the class should be renamed because it reflects the status of a namespace, not its quota. Basic quota support for namespaces -- Key: HBASE-8410 URL: https://issues.apache.org/jira/browse/HBASE-8410 Project: HBase Issue Type: Sub-task Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Attachments: HBASE_8410.patch This task involves creating an observer which provides basic quota support to namespaces in terms of (1) number of tables and (2) number of regions. The quota support can be enabled by setting: property namehbase.coprocessor.region.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property property namehbase.coprocessor.master.classes/name valueorg.apache.hadoop.hbase.namespace.NamespaceController/value /property in the hbase-site.xml. To add quotas to namespace, while creating namespace properties need to be added. Examples: 1. namespace_create 'ns1', {'hbase.namespace.quota.maxregion'='10'} 2. 1. namespace_create 'ns2', {'hbase.namespace.quota.maxtables'='2'}, {'hbase.namespace.quota.maxregion'='5'} The quotas can be modified/added to namespace at any point of time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8802) totalCompactingKVs may overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8802: -- Fix Version/s: 0.95.2 0.98.0 Summary: totalCompactingKVs may overflow (was: totalCompactingKVs overflow) Hadoop Flags: Reviewed totalCompactingKVs may overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Priority: Trivial Fix For: 0.98.0, 0.95.2 Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-8802) totalCompactingKVs may overflow
[ https://issues.apache.org/jira/browse/HBASE-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-8802: - Assignee: Chao Shi totalCompactingKVs may overflow --- Key: HBASE-8802 URL: https://issues.apache.org/jira/browse/HBASE-8802 Project: HBase Issue Type: Bug Reporter: Chao Shi Assignee: Chao Shi Priority: Trivial Fix For: 0.98.0, 0.95.2 Attachments: hbase-8802.patch I happened to get a very large region (mistakely bulk loading tons of HFile into a signle region). When it's getting compacted, the webUI shows a overflow totalCompactingKVs. I found this is due to Compactor#FileDetails#maxKeyCount is int32. It is not a big deal that this variable is only used for displaying compaction progress and everywhere else uses long. totalCompactingKVs=1909276739, currentCompactedKVs=11308733425, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8811) REST service ignores misspelled check= parameter, causing unexpected mutations
Chip Salzenberg created HBASE-8811: -- Summary: REST service ignores misspelled check= parameter, causing unexpected mutations Key: HBASE-8811 URL: https://issues.apache.org/jira/browse/HBASE-8811 Project: HBase Issue Type: Bug Components: REST Affects Versions: 0.95.1 Reporter: Chip Salzenberg Priority: Critical In rest.RowResource.update(), this code keeps executing a request if a misspelled check= parameter is provided. {noformat} if (CHECK_PUT.equalsIgnoreCase(check)) { return checkAndPut(model); } else if (CHECK_DELETE.equalsIgnoreCase(check)) { return checkAndDelete(model); } else if (check != null check.length() 0) { LOG.warn(Unknown check value: + check + , ignored); } {noformat} By my reading of the code, this results in the provided cell value that was intended as a check instead being treated as a mutation, which is sure to destroy user data. Thus the priority of this bug, as it can cause corruption. I suggest that a better reaction than a warning would be, approximately: {noformat} return Response.status(Response.Status.BAD_REQUEST) .type(MIMETYPE_TEXT).entity(Invalid check value ' + check + ') .build(); {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates
[ https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694433#comment-13694433 ] Varun Sharma commented on HBASE-8370: - Also, coming back to the point about the metrics being actionable. RE:If your bloom block cache hit count goes down you can do... Not much. Not worth counting if you can't take action on it. I disagree that its not actionable. I would go fix the block cache in this case. It means there is something seriously wrong with our implementation of the block cache if we are evicting bloom blocks - maybe its just me but I feel we should not be evicting bloom blocks. - If the cache hit rate is too low on Data Blocks, the action item is to increase Block Cache amount. I would agree that index block metrics are not needed or actionable if it is indeed the case that we pin index blocks forever. Report data block cache hit rates apart from aggregate cache hit rates -- Key: HBASE-8370 URL: https://issues.apache.org/jira/browse/HBASE-8370 Project: HBase Issue Type: Improvement Components: metrics Reporter: Varun Sharma Assignee: Varun Sharma Priority: Minor Attaching from mail to d...@hbase.apache.org I am wondering whether the HBase cachingHitRatio metrics that the region server UI shows, can get me a break down by data blocks. I always see this number to be very high and that could be exagerated by the fact that each lookup hits the index blocks and bloom filter blocks in the block cache before retrieving the data block. This could be artificially bloating up the cache hit ratio. Assuming the above is correct, do we already have a cache hit ratio for data blocks alone which is more obscure ? If not, my sense is that it would be pretty valuable to add one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
Fengdong Yu created HBASE-8812: -- Summary: Avoid a wide line on the HMaster webUI if we have more zookeeper quorums Key: HBASE-8812 URL: https://issues.apache.org/jira/browse/HBASE-8812 Project: HBase Issue Type: Improvement Components: master Reporter: Fengdong Yu Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
[ https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HBASE-8812: --- Description: add a line break for every four zookeeper quorums on the HMaster webUI. Avoid a wide line on the HMaster webUI if we have more zookeeper quorums Key: HBASE-8812 URL: https://issues.apache.org/jira/browse/HBASE-8812 Project: HBase Issue Type: Improvement Components: master Reporter: Fengdong Yu Priority: Minor add a line break for every four zookeeper quorums on the HMaster webUI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
[ https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HBASE-8812: --- Status: Patch Available (was: Open) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums Key: HBASE-8812 URL: https://issues.apache.org/jira/browse/HBASE-8812 Project: HBase Issue Type: Improvement Components: master Reporter: Fengdong Yu Priority: Minor Attachments: HBASE-8812.patch add a line break for every four zookeeper quorums on the HMaster webUI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8790) NullPointerException thrown when stopping regionserver
[ https://issues.apache.org/jira/browse/HBASE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694442#comment-13694442 ] Hudson commented on HBASE-8790: --- Integrated in hbase-0.95 #271 (See [https://builds.apache.org/job/hbase-0.95/271/]) HBASE-8790 NullPointerException thrown when stopping regionserver (Liang Xie) (Revision 1497172) Result = FAILURE tedyu : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java NullPointerException thrown when stopping regionserver -- Key: HBASE-8790 URL: https://issues.apache.org/jira/browse/HBASE-8790 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.95.1 Environment: CentOS 5.9 x86_64, java version 1.6.0_45, CDH4.3 Reporter: Xiong LIU Assignee: Liang Xie Fix For: 0.98.0, 0.95.2 Attachments: HBase-8790.txt The Hbase cluster is a fresh start with one regionserver. When we stop hbase, an unhandled NullPointerException is throwed in the regionserver. The regionserver's log is as follows: 2013-06-21 10:21:11,284 INFO [regionserver61020] regionserver.HRegionServer: Closing user regions 2013-06-21 10:21:14,288 DEBUG [regionserver61020] regionserver.HRegionServer: Waiting on 1028785192 2013-06-21 10:21:14,290 FATAL [regionserver61020] regionserver.HRegionServer: ABORTING region server HOSTNAME_TEST,61020,1371781086817 : Unhandled: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:988) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) 2013-06-21 10:21:14,292 FATAL [regionserver61020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache .hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-06-21 10:21:14,293 INFO [regionserver61020] regionserver.HRegionServer: STOPPED: Unhandled: null 2013-06-21 10:21:14,293 INFO [regionserver61020] ipc.RpcServer: Stopping server on 61020 It seems that after closing user regions, the rssStub is null. update: we found that if setting hbase.client.ipc.pool.type to RoundRobinPool(or other pool type) and hbase.client.ipc.pool.size to 10(possibly other values) in hbase-site.xml, the regionserver is continuously attempting connect to master. and if we stop hbase, the above NullPointerException occurred. With hbase.client.ipc.pool.size set to 1, the cluster can be completely stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
[ https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HBASE-8812: --- Attachment: HBASE-8812.patch Avoid a wide line on the HMaster webUI if we have more zookeeper quorums Key: HBASE-8812 URL: https://issues.apache.org/jira/browse/HBASE-8812 Project: HBase Issue Type: Improvement Components: master Reporter: Fengdong Yu Priority: Minor Attachments: HBASE-8812.patch add a line break for every four zookeeper quorums on the HMaster webUI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8812) Avoid a wide line on the HMaster webUI if we have more zookeeper quorums
[ https://issues.apache.org/jira/browse/HBASE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HBASE-8812: --- Description: add a line break for every four zookeeper quorums on the HMaster webUI. I don't think this need a test case. just manual testing is enough. I've tested on my testing cluster. everything works well. was:add a line break for every four zookeeper quorums on the HMaster webUI. Avoid a wide line on the HMaster webUI if we have more zookeeper quorums Key: HBASE-8812 URL: https://issues.apache.org/jira/browse/HBASE-8812 Project: HBase Issue Type: Improvement Components: master Reporter: Fengdong Yu Priority: Minor Attachments: HBASE-8812.patch add a line break for every four zookeeper quorums on the HMaster webUI. I don't think this need a test case. just manual testing is enough. I've tested on my testing cluster. everything works well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju reassigned HBASE-8808: - Assignee: Manukranth Kolloju Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-8808 started by Manukranth Kolloju. Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju updated HBASE-8808: -- Attachment: Screen Shot 2013-06-25 at 11.35.30 AM.png Attaching a sample report of the test coverage in out test suite. Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694460#comment-13694460 ] Manukranth Kolloju commented on HBASE-8808: --- The unit test coverage doesn't take too long. Roughly the unit test execution time is increased by 10-20%. Otherwise, the report generation is very short once the execution data is collected. A very nice feature of this tool is that it shows us the branch coverage unlike other code coverage tools that are out there. I can create a patch and port it to the open source branch if people are interested in this tool and if it is possible. Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju resolved HBASE-8808. --- Resolution: Fixed Hadoop Flags: Reviewed Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 0.89-fb Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png Original Estimate: 24h Remaining Estimate: 24h Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8808) Use Jacoco to generate Unit Test coverage reports
[ https://issues.apache.org/jira/browse/HBASE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju updated HBASE-8808: -- Component/s: build Affects Version/s: 0.89-fb Fix Version/s: 0.89-fb Remaining Estimate: 24h Original Estimate: 24h Use Jacoco to generate Unit Test coverage reports - Key: HBASE-8808 URL: https://issues.apache.org/jira/browse/HBASE-8808 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 0.89-fb Attachments: Screen Shot 2013-06-25 at 11.35.30 AM.png Original Estimate: 24h Remaining Estimate: 24h Enabling the code coverage tool jacoco in maven -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8491) Fixing the TestHeapSizes.
[ https://issues.apache.org/jira/browse/HBASE-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju resolved HBASE-8491. --- Resolution: Fixed Fixing the TestHeapSizes. - Key: HBASE-8491 URL: https://issues.apache.org/jira/browse/HBASE-8491 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 0.89-fb Accounting for the extra references added. Did an absolute count of non static variables and updated accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-8491) Fixing the TestHeapSizes.
[ https://issues.apache.org/jira/browse/HBASE-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju reassigned HBASE-8491: - Assignee: Manukranth Kolloju Fixing the TestHeapSizes. - Key: HBASE-8491 URL: https://issues.apache.org/jira/browse/HBASE-8491 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 0.89-fb Accounting for the extra references added. Did an absolute count of non static variables and updated accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira