[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache
[ https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808166#comment-13808166 ] Alex Feinberg commented on HBASE-9855: -- Some comments from me (original author of this code in 89-fb): 1) This should be annotated as Threadsafe 2) Nit-pick (this is my own typo): "comparator specified when the class instance was constructor" -> "when the class instance was _constructed_" Addressing Ted's comments: 1) [~te...@apache.org] - re: make DefaultValueSetFactory private -- Yes since it's static inner class it might as well be private. 2) Depends -- you need to do ImmustableList.copyOf() for iteration. This is generally the contract of most other collections in j.u which would thrown ConcurrentModificationException. Returning the results as a set can make membership tests efficient. ImmutableList.copyOf is used for iteration as that is the cheapest way to make a copy. Other: [~xieliang007] Can you look through the findbugs -- I'd think they are mostly red herrings, but I'd double check if the equals/hashCode() ones are relevant. Thanks for porting this over! > evictBlocksByHfileName improvement for bucket cache > --- > > Key: HBASE-9855 > URL: https://issues.apache.org/jira/browse/HBASE-9855 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HBase-9855.txt, HBase-9855-v2.txt > > > In deed, it comes from fb's l2 cache by [~avf]'s nice work, i just did a > simple backport here. It could improve a linear-time search through the whole > cache map into a log-access-time map search. > I did a small bench, showed it brings a bit gc overhead, but considering the > evict on close triggered by frequent compaction activity, seems reasonable? > and i thought bring a "evictOnClose" config into BucketCache ctor and only > put/remove the new index map while evictOnClose is true, seems this value > could be set by each family schema, but BucketCache is a global instance not > per each family, so just ignore it rightnow... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache
[ https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807741#comment-13807741 ] Alex Feinberg commented on HBASE-9855: -- Yes, this was a big improvement. [~xieliang007] -- I am also talking about getting JVM GC settings that I used so far. Feel free to put me on code review for this. - af > evictBlocksByHfileName improvement for bucket cache > --- > > Key: HBASE-9855 > URL: https://issues.apache.org/jira/browse/HBASE-9855 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HBase-9855.txt > > > In deed, it comes from fb's l2 cache by [~avf]'s nice work, i just did a > simple backport here. It could improve a linear-time search through the whole > cache map into a log-access-time map search. > I did a small bench, showed it brings a bit gc overhead, but considering the > evict on close triggered by frequent compaction activity, seems reasonable? > and i thought bring a "evictOnClose" config into BucketCache ctor and only > put/remove the new index map while evictOnClose is true, seems this value > could be set by each family schema, but BucketCache is a global instance not > per each family, so just ignore it rightnow... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8894) Forward port compressed l2 cache from 0.89fb
[ https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803923#comment-13803923 ] Alex Feinberg commented on HBASE-8894: -- Hi [~xieliang007] This configuration creates way too memory pressure. I'd also suggest using Java7 (this is the setup that I used at FB). I'll try to come up with actual JVM options I used, but I used: - Total JVM heap size: ~14gb (Xmx and Xms) space available to the l1 cache (the regular block cache): 0.4-0.5 (we have never went above 0.5 as that caused too many problems) - New gen size: 4gb (I *think*, not too sure) - Direct memory: 10gb (I am roughly scaling down to your machine -- you want to leave some memory available to OS) - I gave 0.9% of direct memory to the L2 cache I did use CMS, but I don't remember the CMS initiating ratio. G1 might also work I will try to find the exact JVM configuration How much memory did you give to memstore? Thanks! - af > Forward port compressed l2 cache from 0.89fb > > > Key: HBASE-8894 > URL: https://issues.apache.org/jira/browse/HBASE-8894 > Project: HBase > Issue Type: New Feature >Reporter: stack >Assignee: Liang Xie >Priority: Critical > Attachments: HBASE-8894-0.94-v1.txt, HBASE-8894-0.94-v2.txt > > > Forward port Alex's improvement on hbase-7407 from 0.89-fb branch: > {code} > 1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines > 2 > 3 [master] Implements a secondary compressed cache (L2 cache) > 4 > 5 Author: avf > 6 > 7 Summary: > 8 This revision implements compressed and encoded second-level cache with > off-heap > 9 (and optionally on-heap) storage and a bucket-allocator based on > HBASE-7404. > 10 > 11 BucketCache from HBASE-7404 is extensively modified to: > 12 > 13 * Only handle byte arrays (i.e., no more serialization/deserialization > within) > 14 * Remove persistence support for the time being > 15 * Keep an index of hfilename to blocks for efficient eviction on close > 16 > 17 A new interface (L2Cache) is introduced in order to separate it from the > current > 18 implementation. The L2 cache is then integrated into the classes that > handle > 19 reading from and writing to HFiles to allow cache-on-write as well as > 20 cache-on-read. Metrics for the L2 cache are integrated into > RegionServerMetrics > 21 much in the same fashion as metrics for the existing (L2) BlockCache. > 22 > 23 Additionally, CacheConfig class is re-refactored to configure the L2 > cache, > 24 replace multile constructors with a Builder, as well as replace static > methods > 25 for instantiating the caches with abstract factories (with singleton > 26 implementations for both the existing LruBlockCache and the newly > introduced > 27 BucketCache based L2 cache) > 28 > 29 Test Plan: > 30 1) Additional unit tests > 31 2) Stress test on a single devserver > 32 3) Test on a single-node in shadow cluster > 33 4) Test on a whole shadow cluster > 34 > 35 Revert Plan: > 36 > 37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela > 38 > 39 Reviewed By: liyintang > 40 > 41 CC: gqchen, hbase-eng@ > 42 > 43 Differential Revision: https://phabricator.fb.com/D837264 > 44 > 45 Task ID: 2325295 > 7 > 6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8894) Forward port compressed l2 cache from 0.89fb
[ https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803249#comment-13803249 ] Alex Feinberg commented on HBASE-8894: -- Hi [~saint@gmail.com] [~xieliang007], I was going to reply but didn't have a chance. 1) What JVM settings are you using? These are very important. I do not recall seeing many full GCs. Let me know what parameters you pass to the JVM in regards to memory and GC. 2) What settings are you using for the L2 cache as well as the normal L1 cache? Can you paste the settings from the needed config files? 3) Are you using Jdk 6 or Jdk 7? 4) What about writes? This should change write performance quite a bit -- as serialization costs are also incurred on writes. Thanks! - af > Forward port compressed l2 cache from 0.89fb > > > Key: HBASE-8894 > URL: https://issues.apache.org/jira/browse/HBASE-8894 > Project: HBase > Issue Type: New Feature >Reporter: stack >Assignee: Liang Xie >Priority: Critical > Attachments: HBASE-8894-0.94-v1.txt, HBASE-8894-0.94-v2.txt > > > Forward port Alex's improvement on hbase-7407 from 0.89-fb branch: > {code} > 1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines > 2 > 3 [master] Implements a secondary compressed cache (L2 cache) > 4 > 5 Author: avf > 6 > 7 Summary: > 8 This revision implements compressed and encoded second-level cache with > off-heap > 9 (and optionally on-heap) storage and a bucket-allocator based on > HBASE-7404. > 10 > 11 BucketCache from HBASE-7404 is extensively modified to: > 12 > 13 * Only handle byte arrays (i.e., no more serialization/deserialization > within) > 14 * Remove persistence support for the time being > 15 * Keep an index of hfilename to blocks for efficient eviction on close > 16 > 17 A new interface (L2Cache) is introduced in order to separate it from the > current > 18 implementation. The L2 cache is then integrated into the classes that > handle > 19 reading from and writing to HFiles to allow cache-on-write as well as > 20 cache-on-read. Metrics for the L2 cache are integrated into > RegionServerMetrics > 21 much in the same fashion as metrics for the existing (L2) BlockCache. > 22 > 23 Additionally, CacheConfig class is re-refactored to configure the L2 > cache, > 24 replace multile constructors with a Builder, as well as replace static > methods > 25 for instantiating the caches with abstract factories (with singleton > 26 implementations for both the existing LruBlockCache and the newly > introduced > 27 BucketCache based L2 cache) > 28 > 29 Test Plan: > 30 1) Additional unit tests > 31 2) Stress test on a single devserver > 32 3) Test on a single-node in shadow cluster > 33 4) Test on a whole shadow cluster > 34 > 35 Revert Plan: > 36 > 37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela > 38 > 39 Reviewed By: liyintang > 40 > 41 CC: gqchen, hbase-eng@ > 42 > 43 Differential Revision: https://phabricator.fb.com/D837264 > 44 > 45 Task ID: 2325295 > 7 > 6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8894) Forward port compressed l2 cache from 0.89fb
[ https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783441#comment-13783441 ] Alex Feinberg commented on HBASE-8894: -- Vladimir, Those are very legitimate issues: 1) One approach around the on-heap keys (not an issue in my setup as I was not using file based cache, but certainly an issue with fusion IO) could be to use a hash table (with extension array) or (in cases where the block index is expected to not fit in ram) a b-tree over direct/memory-mapped byte buffers. This would be tricky to implement but it has been done: https://github.com/jankotek/MapDB/tree/master/src/main/java/org/mapdb 2) Eviction algorithm is indeed primitive (and also high on priority of things to fix), but as far as I re-call eviction ( freeSpace() here -- https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L488-625 ) only blocks draining to the ioEngine -- in other words, while the cache space is being free you can still use read from the cache (this uses striped locking) -- and writes will enter RAMCache and be queued for the ioEngine ( https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L648-660 ). All IOEngine draining threads will be blocked during eviction, however -- that may be more problematic for file-based caches -- long draining may cause a lot of entries to be built up in RAMCache. If the queue is full the threads will be blocked, but you can configure to wait up to a maximum amount -- but this doesn't affect actual writes to HBase, as with L2Cache writes only happen during flushes (i.e., flushes will take longer if they happen during eviction). Thanks, - af > Forward port compressed l2 cache from 0.89fb > > > Key: HBASE-8894 > URL: https://issues.apache.org/jira/browse/HBASE-8894 > Project: HBase > Issue Type: New Feature >Reporter: stack >Assignee: Liang Xie >Priority: Critical > Fix For: 0.98.0 > > > Forward port Alex's improvement on hbase-7407 from 0.89-fb branch: > {code} > 1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines > 2 > 3 [master] Implements a secondary compressed cache (L2 cache) > 4 > 5 Author: avf > 6 > 7 Summary: > 8 This revision implements compressed and encoded second-level cache with > off-heap > 9 (and optionally on-heap) storage and a bucket-allocator based on > HBASE-7404. > 10 > 11 BucketCache from HBASE-7404 is extensively modified to: > 12 > 13 * Only handle byte arrays (i.e., no more serialization/deserialization > within) > 14 * Remove persistence support for the time being > 15 * Keep an index of hfilename to blocks for efficient eviction on close > 16 > 17 A new interface (L2Cache) is introduced in order to separate it from the > current > 18 implementation. The L2 cache is then integrated into the classes that > handle > 19 reading from and writing to HFiles to allow cache-on-write as well as > 20 cache-on-read. Metrics for the L2 cache are integrated into > RegionServerMetrics > 21 much in the same fashion as metrics for the existing (L2) BlockCache. > 22 > 23 Additionally, CacheConfig class is re-refactored to configure the L2 > cache, > 24 replace multile constructors with a Builder, as well as replace static > methods > 25 for instantiating the caches with abstract factories (with singleton > 26 implementations for both the existing LruBlockCache and the newly > introduced > 27 BucketCache based L2 cache) > 28 > 29 Test Plan: > 30 1) Additional unit tests > 31 2) Stress test on a single devserver > 32 3) Test on a single-node in shadow cluster > 33 4) Test on a whole shadow cluster > 34 > 35 Revert Plan: > 36 > 37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela > 38 > 39 Reviewed By: liyintang > 40 > 41 CC: gqchen, hbase-eng@ > 42 > 43 Differential Revision: https://phabricator.fb.com/D837264 > 44 > 45 Task ID: 2325295 > 7 > 6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8894) Forward port compressed l2 cache from 0.89fb
[ https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783370#comment-13783370 ] Alex Feinberg commented on HBASE-8894: -- Keep in mind that this is itself based on HBASE-7404. Since I wanted to get this out the door quickly, I kept some of the package/class names similar to original HBase-7404 -- so we'd want to rename them. The big differences is that I've removed any kind of SerDe and changed the flow. Now the flow is: Read: 1. Check if item is in smaller L1 block cache (traditional BlockCache in JVM heap) 2. If not, check if it's in L2 cache 3. Otherwise, go to disk Flush: 1. Write the already compressed and serialized data to L2 cache along with disk. Basically, here L2 cache replaces the OS page cache and allows for a smaller L1 cache. This should help performance in several ways: 1. Compared to page-cache, there's ability (using an map I keep) to evict all the blocks associated with a given file when it's compacted. There's also no page cache pollution as a result of compaction reads, or HDFS replication traffic (already a 3X gain in efficiency). The latter, however, is also true for HBASE-7404. 2. Compared to HBASE-7404, there's ability to keep very hot blocks (both data and meta blocks) in the regular BlockCache, which becomes the L1 cache. That avoids serialization costs for those blocks, unlike only keeping meta-blocks/keeping all blocks in the compressed/serialized cache Basically this gives you a "better page cache" (potentially persistent if other IO engines are introduced, finer grained evictions/control than fadvise, etc...). The proper ratio of L1 to L2 cache (including the direct memory available for JVM's use vs. JVM GC'd heap size) is still to be determined, but there's some math that can be done on this based on things like expected cache hit ratios and costs of hits/misses to different caches. There's also few other low hanging fruits that could be changed in my diff: * Sending blocks evicted from L1 directly to L2 * Evicting blocks from the L2 cache upon promotion to the L1 cache * Porting and testing the file based IO engine (e.g., for fusionIO cards) Thanks! - af > Forward port compressed l2 cache from 0.89fb > > > Key: HBASE-8894 > URL: https://issues.apache.org/jira/browse/HBASE-8894 > Project: HBase > Issue Type: New Feature >Reporter: stack >Assignee: Liang Xie >Priority: Critical > Fix For: 0.98.0 > > > Forward port Alex's improvement on hbase-7407 from 0.89-fb branch: > {code} > 1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines > 2 > 3 [master] Implements a secondary compressed cache (L2 cache) > 4 > 5 Author: avf > 6 > 7 Summary: > 8 This revision implements compressed and encoded second-level cache with > off-heap > 9 (and optionally on-heap) storage and a bucket-allocator based on > HBASE-7404. > 10 > 11 BucketCache from HBASE-7404 is extensively modified to: > 12 > 13 * Only handle byte arrays (i.e., no more serialization/deserialization > within) > 14 * Remove persistence support for the time being > 15 * Keep an index of hfilename to blocks for efficient eviction on close > 16 > 17 A new interface (L2Cache) is introduced in order to separate it from the > current > 18 implementation. The L2 cache is then integrated into the classes that > handle > 19 reading from and writing to HFiles to allow cache-on-write as well as > 20 cache-on-read. Metrics for the L2 cache are integrated into > RegionServerMetrics > 21 much in the same fashion as metrics for the existing (L2) BlockCache. > 22 > 23 Additionally, CacheConfig class is re-refactored to configure the L2 > cache, > 24 replace multile constructors with a Builder, as well as replace static > methods > 25 for instantiating the caches with abstract factories (with singleton > 26 implementations for both the existing LruBlockCache and the newly > introduced > 27 BucketCache based L2 cache) > 28 > 29 Test Plan: > 30 1) Additional unit tests > 31 2) Stress test on a single devserver > 32 3) Test on a single-node in shadow cluster > 33 4) Test on a whole shadow cluster > 34 > 35 Revert Plan: > 36 > 37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela > 38 > 39 Reviewed By: liyintang > 40 > 41 CC: gqchen, hbase-eng@ > 42 > 43 Differential Revision: https://phabricator.fb.com/D837264 > 44 > 45 Task ID: 2325295 > 7 > 6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-8237) Integrate HDFS request profiling with HBase request profiling
[ https://issues.apache.org/jira/browse/HBASE-8237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Feinberg updated HBASE-8237: - Assignee: Liyin Tang (was: Alex Feinberg) > Integrate HDFS request profiling with HBase request profiling > - > > Key: HBASE-8237 > URL: https://issues.apache.org/jira/browse/HBASE-8237 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.89-fb >Reporter: Alex Feinberg >Assignee: Liyin Tang > Fix For: 0.89-fb > > > Since the building blocks to retrieve the RegionServer/DataNode profiling > data is done (in Facebook's HDFS branch -- the changes are/will be posted to > Github soon), it would be great to integrate them together, so that the HBase > client can not only get the RegionServer metrics but also the DataNode > status. It will offer the client a much clear view from end to end > perspective including the disk/network level detail information for each > request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8237) Integrate HDFS request profiling with HBase request profiling
Alex Feinberg created HBASE-8237: Summary: Integrate HDFS request profiling with HBase request profiling Key: HBASE-8237 URL: https://issues.apache.org/jira/browse/HBASE-8237 Project: HBase Issue Type: New Feature Affects Versions: 0.89-fb Reporter: Alex Feinberg Assignee: Alex Feinberg Fix For: 0.89-fb Since the building blocks to retrieve the RegionServer/DataNode profiling data is done (in Facebook's HDFS branch -- the changes are/will be posted to Github soon), it would be great to integrate them together, so that the HBase client can not only get the RegionServer metrics but also the DataNode status. It will offer the client a much clear view from end to end perspective including the disk/network level detail information for each request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510158#comment-13510158 ] Alex Feinberg commented on HBASE-5991: -- Oh great! Didn't know there was a hackathon going on. I actually looked at curator code, but found it a bit overkill for this specific use case (particularly because we already had an implementation of the recovery logic in RecoveringZooKeeper -- so we'd either have to migrate wholesale or keep two implementations of the same code). I did borrow a few ideas from there (even though I didn't follow the exact logic used, I believe), however, so it wasn't purely from the wiki + scratch. After I wrote this patch, we've also open sourced a library that Puma and several other apps use to handle ZK. They use a slightly different version of RecoveringZooKeeper, however, that doesn't embed additional information into the data (like we do). https://github.com/facebook/jcommon/tree/master/zookeeper/src/main/java/com/facebook/zookeeper There are implementations of different recipes there as well. I have no strong preference on which part is better, there's a lot I like about curator (I would seriously consider using it for something I start from scratch). I'd just avoid having multiple implementations of the same ZK abstraction in the codebase. One approach could be to just implement the interfaces with curator and then run this through the unit tests. Good luck! Feel free to put me on the diff(s). I am even more excited about what could now be done on top of these abstractions. - af > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > Fix For: 0.89-fb > > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510103#comment-13510103 ] Alex Feinberg commented on HBASE-5991: -- [~enis] Nice, thanks for jumping into this! Make sure to sync up with [~saint@gmail.com] -- he also wanted to work on protobuf conversion/trunk patch. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > Fix For: 0.89-fb > > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492584#comment-13492584 ] Alex Feinberg commented on HBASE-5991: -- Hi Matteo, It's committed to 89-fb. Stack is working to port this to trunk. Re: asynchronous. I'll have to take a look at trunk first, but couldn't unlock be done using a callback/ListenableFuture (in which case the unlock will be in both .onFailure() and .onSuccess())? > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6508) Filter out edits at log split time
[ https://issues.apache.org/jira/browse/HBASE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Feinberg resolved HBASE-6508. -- Resolution: Fixed Done. Will be merged to 89-fb overnight. > Filter out edits at log split time > -- > > Key: HBASE-6508 > URL: https://issues.apache.org/jira/browse/HBASE-6508 > Project: HBase > Issue Type: Improvement > Components: master, regionserver, wal >Affects Versions: 0.89-fb >Reporter: Alex Feinberg >Assignee: Alex Feinberg > Fix For: 0.89-fb > > > At log splitting time, we can filter out many edits if we have a conservative > estimate of what was saved last in each region. > This patch does the following: > 1) When a region server flushes a MemStore to HFile, store the last flushed > sequence id for the region in a map. > 2) Send the map to master it as a part of the region server report. > 3) Adds an RPC call in HMasterRegionInterface to allow a region server to > query the last last flushed sequence id for a region. > 4) Skips any log entry with sequence id lower than last flushed sequence id > for the region during log split time. > 5) When a region is removed from a region server, removed the the entry for > that region from the map, so that it isn't sent during the next report. > This can reduce downtime when a regionserver goes down quite a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428382#comment-13428382 ] Alex Feinberg commented on HBASE-5991: -- (Please disregard last comment, wrong diff linked.) > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6508) Filter out edits at log split time
Alex Feinberg created HBASE-6508: Summary: Filter out edits at log split time Key: HBASE-6508 URL: https://issues.apache.org/jira/browse/HBASE-6508 Project: HBase Issue Type: Improvement Components: master, regionserver, wal Affects Versions: 0.89-fb Reporter: Alex Feinberg Assignee: Alex Feinberg Fix For: 0.89-fb At log splitting time, we can filter out many edits if we have a conservative estimate of what was saved last in each region. This patch does the following: 1) When a region server flushes a MemStore to HFile, store the last flushed sequence id for the region in a map. 2) Send the map to master it as a part of the region server report. 3) Adds an RPC call in HMasterRegionInterface to allow a region server to query the last last flushed sequence id for a region. 4) Skips any log entry with sequence id lower than last flushed sequence id for the region during log split time. 5) When a region is removed from a region server, removed the the entry for that region from the map, so that it isn't sent during the next report. This can reduce downtime when a regionserver goes down quite a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427796#comment-13427796 ] Alex Feinberg commented on HBASE-5991: -- Integrated and fully working. Will add Javadoc and put up a diff shortly. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427143#comment-13427143 ] Alex Feinberg commented on HBASE-5991: -- Unit test with custom timeout passing. Now working to integrate this and preparing a diff. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424190#comment-13424190 ] Alex Feinberg commented on HBASE-5991: -- All unit tests for exclusion functionality passing. Few issues remaining in handling custom specified timeouts. Will iron out and post a diff early next week. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423686#comment-13423686 ] Alex Feinberg commented on HBASE-5991: -- Got tests for write lock passing (verifying that write locks excludes other writers). Now writing test for read locks (verifying that write locks exclude readers, but that readers do not exclude other reads). After that the tasks are to integrate misc functionality (printing information on lock owners) into the code, clean up, and then replace DistributedLock with WriteLock and run full end-to-end tests. Will put up a diff once this is done. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422909#comment-13422909 ] Alex Feinberg commented on HBASE-5991: -- Mostly done in terms of implementing the locks itself based on the recipe (with recoverable zooKeeper). Should have this integrated into HMaster (in place of my DistributedLock code) and have a diff ready soon. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422437#comment-13422437 ] Alex Feinberg commented on HBASE-5991: -- Working on this right now. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421072#comment-13421072 ] Alex Feinberg commented on HBASE-5991: -- Just spoke to Liyin about this -- I'll work on this week and will post an update (and hopefully a diff) by Thursday. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421070#comment-13421070 ] Alex Feinberg commented on HBASE-5991: -- Hi Jesse, 1) Re: progress -- I have another issue I am working on this week (related to log splitting) -- but let me see if I can shuffle things around and get this finished this week (I'll let you know an ETA): I'll let you know (at the latest) sometime tomorrow (hopefully earlier) if I can get this done this week. If not, I'll let you know so you could work on this. 2) Re: locking the table to read only -- Sequential locks let us introduce read-write lock for metadata, so I think it will also be possible to introduce a write lock for data itself. Good suggestion. Thanks, - Alex > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408832#comment-13408832 ] Alex Feinberg commented on HBASE-5991: -- Hi Ted, Sorry, I haven't followed up on this -- I have been busy. Yes, I still intend to do work on this. Unless you've started working on it, I can finish it: I've already started and have a design in mind. > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Feinberg reassigned HBASE-5991: Assignee: Alex Feinberg > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5991) Introduce sequential ZNode based read/write locks
[ https://issues.apache.org/jira/browse/HBASE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Feinberg reassigned HBASE-5991: Assignee: Alex Feinberg > Introduce sequential ZNode based read/write locks > -- > > Key: HBASE-5991 > URL: https://issues.apache.org/jira/browse/HBASE-5991 > Project: HBase > Issue Type: Improvement >Reporter: Alex Feinberg >Assignee: Alex Feinberg > > This is a continuation of HBASE-5494: > Currently table-level write locks have been implemented using non-sequential > ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to > track converting the table-level locks to sequential ZNodes and supporting > read-write locks, as to solve the issue of preventing schema changes during > region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273756#comment-13273756 ] Alex Feinberg commented on HBASE-5494: -- Yeah, you can go ahead and close out. > Introduce a zk hosted table-wide read/write lock so only one table operation > at a time > -- > > Key: HBASE-5494 > URL: https://issues.apache.org/jira/browse/HBASE-5494 > Project: HBase > Issue Type: Improvement >Reporter: stack > Attachments: D2997.3.patch, D2997.4.patch, D2997.5.patch, > D2997.6.patch > > > I saw this facility over in the accumulo code base. > Currently we just try to sort out the mess when splits come in during an > online schema edit; somehow we figure we can figure all possible region > transition combinations and make the right call. > We could try and narrow the number of combinations by taking out a zk table > lock when doing table operations. > For example, on split or merge, we could take a read-only lock meaning the > table can't be disabled while these are running. > We could then take a write only lock if we want to ensure the table doesn't > change while disabling or enabling process is happening. > Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273558#comment-13273558 ] Alex Feinberg commented on HBASE-5494: -- Created HBASE-5991 for implementation of sequential znode based read/write locks. > Introduce a zk hosted table-wide read/write lock so only one table operation > at a time > -- > > Key: HBASE-5494 > URL: https://issues.apache.org/jira/browse/HBASE-5494 > Project: HBase > Issue Type: Improvement >Reporter: stack > Attachments: D2997.3.patch, D2997.4.patch, D2997.5.patch, > D2997.6.patch > > > I saw this facility over in the accumulo code base. > Currently we just try to sort out the mess when splits come in during an > online schema edit; somehow we figure we can figure all possible region > transition combinations and make the right call. > We could try and narrow the number of combinations by taking out a zk table > lock when doing table operations. > For example, on split or merge, we could take a read-only lock meaning the > table can't be disabled while these are running. > We could then take a write only lock if we want to ensure the table doesn't > change while disabling or enabling process is happening. > Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5991) Introduce sequential ZNode based read/write locks
Alex Feinberg created HBASE-5991: Summary: Introduce sequential ZNode based read/write locks Key: HBASE-5991 URL: https://issues.apache.org/jira/browse/HBASE-5991 Project: HBase Issue Type: Improvement Reporter: Alex Feinberg This is a continuation of HBASE-5494: Currently table-level write locks have been implemented using non-sequential ZNodes as part of HBASE-5494 and committed to 89-fb branch. This issue is to track converting the table-level locks to sequential ZNodes and supporting read-write locks, as to solve the issue of preventing schema changes during region splits or merges. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268107#comment-13268107 ] Alex Feinberg commented on HBASE-5494: -- .bq You thought this overkill for your case? http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks That is fine. Do you think we could backfill it later underneath the patch attached here? I went down the non-sequential route (as you said, thinking it was over-kill and simple "create if not exist" approach would work), although I later realized that some of the potential race conditions would likely not happen if I went with their approach. I think we could backfill it later once we create read-write locks. I do like the idea of a new master coming up to finish previous work. If we make the ZNode data more machine parseable (e.g., convert it to protobuf in trunk) than this would be feasible to do (when a new master is brought up, the master scans the lock to see if there were any operations in progress when the previous master died). I agree that lock and unlock shouldn't really be public APIs (in the sense of being directly accessible to end developers) -- I'll make lockTable() and unlockTable() be package-local methods then, to that end. > Introduce a zk hosted table-wide read/write lock so only one table operation > at a time > -- > > Key: HBASE-5494 > URL: https://issues.apache.org/jira/browse/HBASE-5494 > Project: HBase > Issue Type: Improvement >Reporter: stack > Attachments: D2997.3.patch, D2997.4.patch > > > I saw this facility over in the accumulo code base. > Currently we just try to sort out the mess when splits come in during an > online schema edit; somehow we figure we can figure all possible region > transition combinations and make the right call. > We could try and narrow the number of combinations by taking out a zk table > lock when doing table operations. > For example, on split or merge, we could take a read-only lock meaning the > table can't be disabled while these are running. > We could then take a write only lock if we want to ensure the table doesn't > change while disabling or enabling process is happening. > Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268042#comment-13268042 ] Alex Feinberg commented on HBASE-5494: -- Re: "One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That woul prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place." This is an interesting question. I think one approach may be to create a region level lock manager, and to convert the table-level lock manager to support read-write locks. Schema modifications (create/disable/alter/delete/) would acquire a table-wide read lock (as now). For splits and merges, region servers would acquire a table wide _read lock_ (to allow two regionserves to split differnet regions of a table at the same time, but prevent schema modifications during a split/merge), and a write lock (i.e., a usual lock) over the regions that are being split (I'm not even sure if this step is even needed at this point). We also need a way to handle stuck locks (currently DistributedLock uses persistent ZNodes) after crashes with minimal (if any) manual intervention (key thing being that whatever schema-modification was started prior to the crash is safely rolled back -- which may be non-trivial, as I would guess it would more complex than just keeping a txn id in the log and then reading through the HLog for META). > Introduce a zk hosted table-wide read/write lock so only one table operation > at a time > -- > > Key: HBASE-5494 > URL: https://issues.apache.org/jira/browse/HBASE-5494 > Project: HBase > Issue Type: Improvement >Reporter: stack > Attachments: D2997.3.patch, D2997.4.patch > > > I saw this facility over in the accumulo code base. > Currently we just try to sort out the mess when splits come in during an > online schema edit; somehow we figure we can figure all possible region > transition combinations and make the right call. > We could try and narrow the number of combinations by taking out a zk table > lock when doing table operations. > For example, on split or merge, we could take a read-only lock meaning the > table can't be disabled while these are running. > We could then take a write only lock if we want to ensure the table doesn't > change while disabling or enabling process is happening. > Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267681#comment-13267681 ] Alex Feinberg commented on HBASE-5494: -- This patch implements a ZK-hosted mutual exclusion lock (DistributedLock), and table level locks (TableLockManager), and ensures that all schema changing operations are serialized. Further work would be needed to add read-write locks to handle region splitting and merges. > Introduce a zk hosted table-wide read/write lock so only one table operation > at a time > -- > > Key: HBASE-5494 > URL: https://issues.apache.org/jira/browse/HBASE-5494 > Project: HBase > Issue Type: Improvement >Reporter: stack > Attachments: D2997.3.patch > > > I saw this facility over in the accumulo code base. > Currently we just try to sort out the mess when splits come in during an > online schema edit; somehow we figure we can figure all possible region > transition combinations and make the right call. > We could try and narrow the number of combinations by taking out a zk table > lock when doing table operations. > For example, on split or merge, we could take a read-only lock meaning the > table can't be disabled while these are running. > We could then take a write only lock if we want to ensure the table doesn't > change while disabling or enabling process is happening. > Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira